Stacked Area Plot

From Sustainability Methods
Revision as of 15:44, 20 March 2022 by Ollie (talk | contribs) (Created page with "'''Note:''' This entry revolves specifically around Stacked Area plots.. For more general information on quantitative data visualisation, please refer to Introduction to sta...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Note: This entry revolves specifically around Stacked Area plots.. For more general information on quantitative data visualisation, please refer to Introduction to statistical figures.

In short: This entry aims to introduce Stacked Area Plot and its visualization using R’s ggplot2 package. A Stacked Area Plot is similar to an Area Plot with the difference that it uses multiple data series. And an Area Plot is similar to a Line Plot with the difference that the area under the line is colored.

Overview

A Stacked Area Plot displays quantitative values for multiple data series. The plot comprises of lines with the area below the lines colored (or filled) to represent the quantitative value for each data series. The Stacked Area Plot displays the evolution of a numeric variable for several groups of a dataset. Each group is displayed on top of each other, making it easy to read the evolution of the total. A Stacked Area Plot is used to track the total value of the data series and also to understand the breakdown of that total into the different data series. Comparing the heights of each data series allows us to get a general idea of how each subgroup compares to the other in their contributions to the total. A data series is a set of data represented as a line in a Stacked Area Plot.

Best practices

Limit the number of data series. The more data series presented, the more the color combinations are used.

Consider the order of the lines. While the total shape of the plot will be the same regardless of the order of the data series lines, reading the plot can be supported through a good choice of line order.

Some issues with stacking

Stacked Area Plots must be applied carefully since they have some limitations. They are appropriate to study the evolution of the whole data series and the relative proportions of each data series, but not to study the evolution of each individual data series.

To have a clearer understanding, let us plot an example of a Stacked Area Plot in R.

Plotting in R

R uses the function geom_area() to create Stacked Area Plots. The function geom_area() has the following syntax:

Syntax: ggplot(Data, aes(x=x_variable, y=y_variable, fill=group_variable)) + geom_area()

Parameters:

  • Data: This parameter contains whole dataset (with the different data series) which are used in Stacked Area Plot.
  • x: This parameter contains numerical value of variable for x axis in Stacked Area Plot.
  • y: This parameter contains numerical value of variables for y axis in Stacked Area Plot.
  • fill: This parameter contains group column of Data which is mainly used for analyses in Stacked Area Plot.

Now, we will plot the Stacked Area Plot in R. We will need the following R packages:

library(tidyverse)  #This package contains the ggplot2 needed to apply the function geom_area()
library(gcookbook)  #This package contains the dataset for the exercise
Fig.1: An example of the stacked area plot

Plotting the dataset "uspopage" using the function geom_area() from the ggplot2 package:

#Fig.1
ggplot(uspopage, aes(x = Year , y = Thousands, fill = AgeGroup)) +
  geom_area()

From this Stacked Area Plot, we can visualize the evolution of the US population throughout the years, with all the age groups growing steadily with time, especially the population higher than 64 years old.

Additional

Fig.2: Stacked area plot after customization.

Additionally, we can play with the format of the plot. To our previous example, we will reduce the size of the lines, scale the color of the filling to different tones of “Blues”, and add labels.

ggplot(uspopage, aes(x = Year, y = Thousands, fill = AgeGroup)) +
  geom_area(colour = "black", size = .2, alpha = .4) +
  scale_fill_brewer(palette = "Blues")+
  labs(title = "US Population by Age", 
       subtitle = "Between 1900 and 2000",
       x = "Year",
       y = "Population (Thousands)")

References

  • R Graphics Cookbook, 2nd edition by Winston Chang

The author of this entry is Jose Machuca.