Difference between revisions of "Descriptive statistics"
Line 6: | Line 6: | ||
[[File:Bildschirmfoto 2020-03-28 um 15.48.41.png|thumb|left|This graphic visualizes what mean, mode and median explain regarding a dataset.]] | [[File:Bildschirmfoto 2020-03-28 um 15.48.41.png|thumb|left|This graphic visualizes what mean, mode and median explain regarding a dataset.]] | ||
− | Mean | + | ====Mean==== |
+ | The [https://www.youtube.com/watch?v=mk8tOD0t8M0 mean] is the average of numbers you can simply calculated by adding up all the numbers and then divide them by how many numbers there are in total. | ||
− | Median | + | ====Median==== |
+ | The medium is the middle number in assorted set of numbers. It can be substantially different from the mean value for instance, when you have large gaps or cover wide ranges within your data. Therefore, it is more robust against outliers. | ||
− | Mode | + | ====Mode==== |
+ | The mode is the value that appears most often. It can be helpful in large datasets are when you have a lot of repetitions within the dataset. | ||
− | Range | + | ====Range==== |
+ | The range is simply the difference between the lowest and the highest value and consequently it can also be calculated like this. | ||
− | Standard deviation | + | ====Standard deviation==== |
+ | The standard deviation is calculated as the square root of variance by determining the variation between each data point relative to the mean. It is a measure of how spread out your numbers are. If the data points are further from the mean, there is a higher deviation within the data set. The higher the standard deviation, the more spread out the data. | ||
<syntaxhighlight lang="R" line> | <syntaxhighlight lang="R" line> |
Revision as of 16:06, 9 February 2021
Descriptive stats are what most people think stats are all about. Many people believe that the simple observation of more or less, or the mere calculation of an average value is what statistics are all about. The media often shows us such descriptive statistics in whimsical bar plots or even pie charts.
Contents
Mean
The mean is the average of numbers you can simply calculated by adding up all the numbers and then divide them by how many numbers there are in total.
Median
The medium is the middle number in assorted set of numbers. It can be substantially different from the mean value for instance, when you have large gaps or cover wide ranges within your data. Therefore, it is more robust against outliers.
Mode
The mode is the value that appears most often. It can be helpful in large datasets are when you have a lot of repetitions within the dataset.
Range
The range is simply the difference between the lowest and the highest value and consequently it can also be calculated like this.
Standard deviation
The standard deviation is calculated as the square root of variance by determining the variation between each data point relative to the mean. It is a measure of how spread out your numbers are. If the data points are further from the mean, there is a higher deviation within the data set. The higher the standard deviation, the more spread out the data.
#descriptive statistics using the Swiss dataset swiss swiss_data<-swiss #we are choosing the column fertility for this example #let's begin with calculating the mean mean(swiss_data$Fertility) #median median(swiss_data$Fertility) #range range(swiss_data$Fertility) #standard deviation sd(swiss_data$Fertility) #summary - includes minimum, maximum, mean, median, 1st & 3rd Quartile summary(swiss_data$Fertility)
External Links
Videos
Descriptive Statistics: A whole video series about descriptive statistics from the Khan academy
Standard Deviation: A brief explanation
Mode, Median, Mean, Range & Standard Deviation: A good summary
Articles
Descriptive Statistics: An introduction
Descriptive Statistics: A detailed summary