Kernel density plot

From Sustainability Methods
Revision as of 18:05, 1 March 2021 by Ollie (talk | contribs)

In short: This document aims to introduce kernel density plot and its visualization using R’s ggplot2 package. Density plot is used to plot the distribution of a single quantiative variable. It allows to see which score of a variable is more frequent and which score is relatively rare. The x-axis represents the values of the variable whereas the y-axis represents its density. The area under the curve equates to 1.

Packages used : gapminder, ggplot2

# Install and load the gapminder and ggplot2 packages
install.packages("gapminder")
library(gapminder)
library(ggplot2)
#A glimpse of the gapminder dataset
head(gapminder)
File 2021-03-01 at 18.36.14.png
#?gapminder
#View(gapminder)
#Using the basic plot function of R to view the distribution of GDP per capita

plot(density(gapminder$gdpPercap))
File 2021-03-01 at 18.42.26.png

Bandwidth determines the smoothing and detail of a variable. The bandwidth can be changed in the aes parameter of gemo_density() function. The default bandwidth can be viewed as:

bw.nrd0(gapminder$lifeExp)

#Output: [1] 2.624907

A basic density plot of life expectancy with ggplpot2() over the years can be viewed as:

ggplot(gapminder, aes(x = lifeExp))+
   geom_density(fill = "red", bw = 1)+
   labs(title = "Life expectancy over the years")
FIle 2021-03-01 at 18.49.00.png

Representation of life expectancy for every continent can be further seen with using the "continent" variable for the fill parameter.

ggplot(gapminder, aes(x = lifeExp))+
    geom_density(aes(fill = continent, color = continent), alpha = 0.5)+
    scale_fill_discrete(name = "Continent")+
    scale_color_discrete(name = "Continent")+
    labs(title = "Life expectancy over the years")
File 2021-03-01 at 18.52.47.png

Faceting With facetting, the variable can be split into groups and viewed side-by-side for a better comparison. The code for viewing the plot below is the following:

ggplot(gapminder, aes(x = lifeExp))+
    geom_density(aes(fill = continent, color = continent),alpha = 0.5)+
    scale_fill_discrete(name = "Continent")+
    scale_color_discrete(name = "Continent")+
    labs(title = "Life Expectancy over the years")+
    facet_wrap(continent ~.)
File 2021-03-01 at 18.57.00.png

Refernces:

  1. Lecture slides.
  2. "Histograms and Density Plots in Python" by Will Koehrson
  3. Kabacoff, R. (2018). Data visualization with R. EEUU: Wesleyan University.

The author of this entry is Archana Maurya.