Simple data visualisation

From Sustainability Methods

Simple data visualisation

Scatter Plot

Description Scatter plots can be useful for showing the relationship between two things, because they allow you to encode data simultaneously on a horizontal x‐axis and vertical y‐axis to see whether and what relationship exists.

  • (Cole Nussbaumer Knaflic-Storytelling with Data)*

You can create scatter plots if you have a pair of continuous (or numeric) data.

Examples in R

Example 1: Basic Scatter Plot The basic Scatter Plot that we will plot will be based on a dataset, that comes built-in with R, called trees.

The data set contains data on the girth, height and the volume of different trees.

We will first plot the histogram shown in the general structure section above.

Structure of the Data The data frame for trees dataset looks like this:

Girth Height Volume
8.3 70 10.3
8.6 65 10.3
8.8 63 10.2
... ... ...

Here, the data for all the columns are numeric. So, no further data transformation is necessary.

R Code to Plot the Data

# Plot a basic histogram
# look at the data
head(trees)

# Plot a basic scatter plot
plot(x = trees$Girth, y = trees$Height)

Result in R This is a basic scatter plot made using R.

Example 2: Better Scatter Plot In this section, we will take the plot from the previous example and customize it by changing the shape and color of the points, and by adding a title and x- and y-axis labels to the plot.

R code to plot the chart

# look at the data
head(trees)

# Create a scatter plot with labels and colors
plot(x=trees$Girth, y=trees$Height, # choose the x- and y-values
     pch=16,                        # choose how points look on the plot
     col='blue',                    # choose the color of the points
     main='Scatter Plot of Girth and Height of Trees', # main header of the plot
     xlab='Tree girth', ylab='Tree height')            # x- and y-axis labels

Result in R Minor customizations make the plot look more professional and understandable.

Minor customizations make the plot look more professional and understandable.

Related Links

Bar chart

Description (Also known as: column chart)

A bar chart displays quantitative values for different categories. The chart comprises line marks (bars) – not rectangular areas – with the size attribute (length or height) used to represent the quantitative value for each category. - Andy Kirk - Data Visualization

General Structure of Bar Chart

This figure shows the structure of a bar chart.

Example in R

We will first plot the bar chart shown above in the section above. The basic bar chart that we will plot will be based on a dataset built-in to R called mtcars. The data set contains data on specifications of different cars. One such specification is the number of gears a given car's transmission has. We will first create a summary table that contains the number of cars for a given count of gears. Then, we will use that table to create the plot.

Structure of the Data The table that contains information about the frequency of cars for a given number of gears looks like this:

gears freq
3 15
4 12
5 5
... ...

Here, the data for gears column are categories, and the data for freq columns are numeric.

Example 1: Basic Bar Chart

R code to plot the chart

# get the data
gears <- table(mtcars$gear)

# Plot a basic bar chart with a title and labels
barplot(gears,
        main = "Frequency of Vehicles of each Gear Type",   # title of the plot
        xlab = "Number of Gears", ylab = "Number of Cars")  # labels of the plot

Result in R This is how the output in R looks like.

Bar Chart.png

Related Links

Line chart

Description A line chart shows how quantitative values for different categories have changed over time. They are typically structured around a temporal x-axis with equal intervals from the earliest to latest point in time. Quantitative values are plotted using joined-up lines that effectively connect consecutive points positioned along a y-axis. The resulting slopes formed between the two ends of each line provide an indication of the local trends between points in time. As this sequence is extended to plot all values across the time frame it forms an overall line representative of the quantitative change over time story for a single categorical value.

Multiple categories can be displayed in the same view, each represented by a unique line. Sometimes a point (circle/dot) is also used to substantiate the visibility of individual values. The lines used in a line chart will generally be straight. However, sometimes curved line interpolation may be used as a method of estimating values between known data points. This approach can be useful to help emphasise a general trend. While this might slightly compromise the visual accuracy of discrete values if you already have approximations, this will have less impact.

(Note- the description was based on a book by Andy Kirk named "Data Visualization")

Examples in R

We will first plot the line chart shown in the section above.

The basic line chart that we will plot will be based on a built-in dataset called EuStockMarkets. The data set contains data on the closing stock prices of different European stock indices over the years 1991 to 1998.

To make things easier, we will first transform the built-in dataset into a data frame object. Then, we will use that data frame to create the plot.

Structure of the Data The table that contains information about the different market indices looks like this:

DAX SMI CAC FTSE
1628.75 1678.1 1772.8 2443.6
1613.63 1688.5 1750.5 2460.2
1606.51 1678.6 1718.0 2448.2
... ... ... ...

Here, the data for all the columns are numeric.

Example 1: Basic Line Chart This line chart shows how the DAX index from the table from previous section.

R code to plot the chart

# read the data as a data frame
eu_stocks <- as.data.frame(EuStockMarkets)

# Plot a basic line chart
plot(eu_stocks$DAX,  # simply select a stock index
     type='l')       # choose 'l' for line chart

Result in R

Simple line chart.png

As you can see, the plot is very simple. We can enhance the way this plot looks by making a few tweaks as shown in the section below.

Example 2: Better Looking Line Chart Here, we will plot the DAX index again as we did in Example 1. However, the plot will be enhanced to be more informative and aesthetically pleasing.

R code to plot the chart

# get the data
eu_stocks <- as.data.frame(EuStockMarkets)

# Plot a basic line chart
plot(eu_stocks$DAX, # select the data
     type='l',      # choose 'l' for line chart
     col='blue',    # choose the color of the line
     lwd = 2,       # choose the line width 
     main = 'Line Chart of DAX Index (1991-1998)',         # title of the plot
     xlab = 'Time (1991 to 1998)', ylab = 'Prices in EUR') # x- and y-axis labels

Result in R

Line chart.png

You can see that this plot looks much more informative and attractive.

Related Links

Histogram

Description A histogram displays the frequency and distribution for a range of quantitative groups. Whereas Histograms compare quantities for different categories, a histogram technically compares the number of observations across a range of value ‘bins’ using the size of lines/bars (if the bins relate to values with equal intervals) or the area of rectangles (if the bins have unequal value ranges) to represent the quantitative counts. With the bins arranged in meaningful order (that effectively form ordinal groupings) the resulting shape formed reveals the overall pattern of the distribution of observations.

- Andy Kirk - Data Visualization

General Structure of Histogram

This is how a histogram looks.

Examples in R

We will first plot the histogram shown in the general structure section above.

The basic histogram that we will plot will be based on a built-in dataset called cars. This data set contains data on stopping distance of different cars at different speeds.

Since both the values are numeric, we don't need to transform the data in any way in order to plot a histogram.

Structure of the Data The table that contains information about the stopping distance of different cars at a given speed looks like this:

speed dist
4 2
4 10
7 4
7 22
8 16
9 10
... ...

Here, the data for both speed and dist columns are numeric.

Example 1: Basic Histogram (with speed variable)

R code to plot the chart

# data that we are going to use
View(cars)

# Plot a basic histogram
hist(cars$speed,
     main = "Histogram for speed of cars", # main title
     xlab = "Speed") # x-axis label

Result in R

Simple Histogram.png

Example 2: Better looking Histogram (with dist variable)

R code to plot the chart

# data that we are going to use
View(cars)

# Plot a basic histogram
# data that we are going to use
View(cars)

# Plot a basic histogram
hist(cars$dist,
     breaks = 15, # define the number of bins you want in the histogram
     col = 'seagreen', # define the color of the bars in the histogram
     main = "Histogram for stopping distance of cars", # main title
     xlab = "Stopping Distance") # x-axis label

Result in R

This is a better looking histogram.

Related Links