# Simple data visualisation

## Simple data visualisation

#### Scatter Plot

**Description**
Scatter plots can be useful for showing the relationship between two things, because they allow you to encode data simultaneously on a horizontal x‐axis and vertical y‐axis to see whether and what relationship exists.

- (Cole Nussbaumer Knaflic-Storytelling with Data)*

You can create scatter plots if you have a pair of continuous (or numeric) data.

**Examples in R**

**Example 1: Basic Scatter Plot**
The basic Scatter Plot that we will plot will be based on a dataset, that comes built-in with R, called `trees`

.

The data set contains data on the girth, height and the volume of different trees.

We will first plot the histogram shown in the general structure section above.

**Structure of the Data**
The data frame for `trees`

dataset looks like this:

Girth | Height | Volume |
---|---|---|

8.3 | 70 | 10.3 |

8.6 | 65 | 10.3 |

8.8 | 63 | 10.2 |

... | ... | ... |

Here, the data for all the columns are numeric. So, no further data transformation is necessary.

**R Code to Plot the Data**

# Plot a basic histogram # look at the data head(trees) # Plot a basic scatter plot plot(x = trees$Girth, y = trees$Height)

**Example 2: Better Scatter Plot**
In this section, we will take the plot from the previous example and customize it by changing the shape and color of the points, and by adding a title and x- and y-axis labels to the plot.

R code to plot the chart

# look at the data head(trees) # Create a scatter plot with labels and colors plot(x=trees$Girth, y=trees$Height, # choose the x- and y-values pch=16, # choose how points look on the plot col='blue', # choose the color of the points main='Scatter Plot of Girth and Height of Trees', # main header of the plot xlab='Tree girth', ylab='Tree height') # x- and y-axis labels

Minor customizations make the plot look more professional and understandable.

**Related Links**

- Histogram
- density plot
- box plot

#### Bar chart

**Description**
(Also known as: column chart)

A bar chart displays quantitative values for different categories. The chart comprises line marks (bars) – not rectangular areas – with the size attribute (length or height) used to represent the quantitative value for each category.
*- Andy Kirk - Data Visualization*

**General Structure of Bar Chart**

**Example in R**

We will first plot the bar chart shown above in the section above.
The basic bar chart that we will plot will be based on a dataset built-in to R called `mtcars`

. The data set contains data on specifications of different cars. One such specification is the number of gears a given car's transmission has.
We will first create a summary table that contains the number of cars for a given count of gears. Then, we will use that table to create the plot.

Structure of the Data The table that contains information about the frequency of cars for a given number of gears looks like this:

gears | freq |
---|---|

3 | 15 |

4 | 12 |

5 | 5 |

... | ... |

Here, the data for `gears`

column are categories, and the data for `freq`

columns are numeric.

**Example 1: Basic Bar Chart**

R code to plot the chart

# get the data gears <- table(mtcars$gear) # Plot a basic bar chart with a title and labels barplot(gears, main = "Frequency of Vehicles of each Gear Type", # title of the plot xlab = "Number of Gears", ylab = "Number of Cars") # labels of the plot

Result in R This is how the output in R looks like.

**Related Links**

#### Line chart

**Description**
A line chart shows how quantitative values for different categories have changed over time. They are typically structured around a temporal x-axis with equal intervals from the earliest to latest point in time. Quantitative values are plotted using joined-up lines that effectively connect consecutive points positioned along a y-axis. The resulting slopes formed between the two ends of each line provide an indication of the local trends between points in time. As this sequence is extended to plot all values across the time frame it forms an overall line representative of the quantitative change over time story for a single categorical value.

Multiple categories can be displayed in the same view, each represented by a unique line. Sometimes a point (circle/dot) is also used to substantiate the visibility of individual values. The lines used in a line chart will generally be straight. However, sometimes curved line interpolation may be used as a method of estimating values between known data points. This approach can be useful to help emphasise a general trend. While this might slightly compromise the visual accuracy of discrete values if you already have approximations, this will have less impact.

*(Note- the description was based on a book by Andy Kirk named "Data Visualization")*

**Examples in R**

We will first plot the line chart shown in the section above.

The basic line chart that we will plot will be based on a built-in dataset called `EuStockMarkets`

. The data set contains data on the closing stock prices of different European stock indices over the years 1991 to 1998.

To make things easier, we will first transform the built-in dataset into a data frame object. Then, we will use that data frame to create the plot.

Structure of the Data The table that contains information about the different market indices looks like this:

DAX | SMI | CAC | FTSE |
---|---|---|---|

1628.75 | 1678.1 | 1772.8 | 2443.6 |

1613.63 | 1688.5 | 1750.5 | 2460.2 |

1606.51 | 1678.6 | 1718.0 | 2448.2 |

... | ... | ... | ... |

Here, the data for all the columns are numeric.

**Example 1: Basic Line Chart**
This line chart shows how the `DAX`

index from the table from previous section.

R code to plot the chart

# read the data as a data frame eu_stocks <- as.data.frame(EuStockMarkets) # Plot a basic line chart plot(eu_stocks$DAX, # simply select a stock index type='l') # choose 'l' for line chart

Result in R

As you can see, the plot is very simple. We can enhance the way this plot looks by making a few tweaks as shown in the section below.

**Example 2: Better Looking Line Chart**
Here, we will plot the DAX index again as we did in Example 1. However, the plot will be enhanced to be more informative and aesthetically pleasing.

R code to plot the chart

# get the data eu_stocks <- as.data.frame(EuStockMarkets) # Plot a basic line chart plot(eu_stocks$DAX, # select the data type='l', # choose 'l' for line chart col='blue', # choose the color of the line lwd = 2, # choose the line width main = 'Line Chart of DAX Index (1991-1998)', # title of the plot xlab = 'Time (1991 to 1998)', ylab = 'Prices in EUR') # x- and y-axis labels

Result in R

You can see that this plot looks much more informative and attractive.

**Related Links**

#### Histogram

**Description**
A histogram displays the frequency and distribution for a range of quantitative groups. Whereas Histograms compare quantities for different categories, a histogram technically compares the number of observations across a range of value ‘bins’ using the size of lines/bars (if the bins relate to values with equal intervals) or the area of rectangles (if the bins have unequal value ranges) to represent the quantitative counts. With the bins arranged in meaningful order (that effectively form ordinal groupings) the resulting shape formed reveals the overall pattern of the distribution of observations.

*- Andy Kirk - Data Visualization*

**General Structure of Histogram**

**Examples in R**

We will first plot the histogram shown in the general structure section above.

The basic histogram that we will plot will be based on a built-in dataset called `cars`

. This data set contains data on stopping distance of different cars at different speeds.

Since both the values are numeric, we don't need to transform the data in any way in order to plot a histogram.

Structure of the Data The table that contains information about the stopping distance of different cars at a given speed looks like this:

speed | dist |
---|---|

4 | 2 |

4 | 10 |

7 | 4 |

7 | 22 |

8 | 16 |

9 | 10 |

... | ... |

Here, the data for both `speed`

and `dist`

columns are numeric.

**Example 1: Basic Histogram**
(with `speed`

variable)

R code to plot the chart

# data that we are going to use View(cars) # Plot a basic histogram hist(cars$speed, main = "Histogram for speed of cars", # main title xlab = "Speed") # x-axis label

Result in R

**Example 2: Better looking Histogram**
(with `dist`

variable)

R code to plot the chart

# data that we are going to use View(cars) # Plot a basic histogram # data that we are going to use View(cars) # Plot a basic histogram hist(cars$dist, breaks = 15, # define the number of bins you want in the histogram col = 'seagreen', # define the color of the bars in the histogram main = "Histogram for stopping distance of cars", # main title xlab = "Stopping Distance") # x-axis label

Result in R

**Related Links**

- Scatter plot
- density plot
- Boxplot