Difference between revisions of "Descriptive statistics"

From Sustainability Methods
 
(18 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
__NOTOC__
 +
[[File:Bildschirmfoto 2020-03-28 um 15.39.37.png|200px|right|frameless|]]
  
[[File:Bildschirmfoto 2020-03-28 um 15.39.37.png|thumb|Descriptive Statistics is the most basic things you can do in statistics. Most of you probably also already calculated things like mean and median in school.]]
+
'''Descriptive stats are what most people think stats are all about.''' Many people believe that the simple observation of ''more'' or ''less'', or the mere calculation of an average value, is what statistics are all about. Of course, this is not the case - statistics is more than descriptive statistics, or whimsical [[Introduction to statistical figures|bar plots or even pie charts]]. Still, knowing the basics is important, and most of you probably already calculated things like mean and median in school. So let us have another look to refresh your memory.
  
Descriptive stats are what most people think stats are all about. Many people believe that the simple observation of more or less, or the mere calculation of an average value is what statistics are all about. The media often shows us such descriptive statistics in whimsical bar plots or even pie charts.
+
== Basics of descriptive statistics ==
 
+
[[File:Bildschirmfoto 2020-03-28 um 15.48.41.png|200px|thumb|right|This graphic visualizes what mean, mode and median explain regarding a dataset.]]
[[File:Bildschirmfoto 2020-03-28 um 15.48.41.png|thumb|left|This graphic visualizes what mean, mode and median explain regarding a dataset.]]
 
  
 
====Mean====
 
====Mean====
The [https://www.youtube.com/watch?v=mk8tOD0t8M0 mean] is the average of numbers you can simply calculated by adding up all the numbers and then divide them by how many numbers there are in total.
+
The [https://www.youtube.com/watch?v=mk8tOD0t8M0 mean] is the average of numbers you can simply calculate by adding up all the numbers and then divide them by how many numbers there are in total.
  
 
====Median====
 
====Median====
The medium is the middle number in assorted set of numbers. It can be substantially different from the mean value for instance, when you have large gaps or cover wide ranges within your data. Therefore, it is more robust against outliers.
+
The median is the middle number in a sorted set of numbers. It can be substantially different from the mean value, for instance when you have large gaps or cover wide ranges within your [[Glossary|data]]. Therefore, it is more robust against outliers.
  
 
====Mode====
 
====Mode====
The mode is the value that appears most often. It can be helpful in large datasets are when you have a lot of repetitions within the dataset.
+
The mode is the value that appears most often. It can be helpful in large datasets or when you have a lot of repetitions within the dataset.
  
 
====Range====
 
====Range====
 
The range is simply the difference between the lowest and the highest value and consequently it can also be calculated like this.
 
The range is simply the difference between the lowest and the highest value and consequently it can also be calculated like this.
 +
 +
[[File:Bildschirmfoto 2020-03-28 um 15.51.31.png|thumb|right|This graph shows how the standard deviation is spread from the mean.]]
  
 
====Standard deviation====
 
====Standard deviation====
 
The standard deviation is calculated as the square root of variance by determining the variation between each data point relative to the mean. It is a measure of how spread out your numbers are. If the data points are further from the mean, there is a higher deviation within the data set. The higher the standard deviation, the more spread out the data.
 
The standard deviation is calculated as the square root of variance by determining the variation between each data point relative to the mean. It is a measure of how spread out your numbers are. If the data points are further from the mean, there is a higher deviation within the data set. The higher the standard deviation, the more spread out the data.
  
 +
== R examples ==
 +
Now, let us have a look at how to calculate these values in R.
 
<syntaxhighlight lang="R" line>
 
<syntaxhighlight lang="R" line>
  
Line 45: Line 50:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
[[File:Bildschirmfoto 2020-03-28 um 15.51.31.png|thumb|left|This graph shows how the standard deviation is spread from the mean.]]
 
 
='''External Links'''=
 
 
=='''Videos'''==
 
 
[https://www.youtube.com/watch?v=h8EYEJ32oQ8&list=PLU5aQXLWR3_yYS0ZYRA-5g5YSSYLNZ6Mc Descriptive Statistics]: A whole video series about descriptive statistics from the Khan academy
 
 
[https://www.youtube.com/watch?v=MRqtXL2WX2M Standard Deviation]: A brief explanation
 
 
[https://www.youtube.com/watch?v=mk8tOD0t8M0 Mode, Median, Mean, Range & Standard Deviation]: A good summary
 
 
 
=='''Articles'''==
 
  
[https://www.investopedia.com/terms/d/descriptive_statistics.asp Descriptive Statistics]: An introduction
+
==External Links==
 +
====Videos====
 +
* [https://www.youtube.com/watch?v=h8EYEJ32oQ8&list=PLU5aQXLWR3_yYS0ZYRA-5g5YSSYLNZ6Mc Descriptive Statistics]: A whole video series about descriptive statistics from the Khan academy
 +
* [https://www.youtube.com/watch?v=MRqtXL2WX2M Standard Deviation]: A brief explanation
 +
* [https://www.youtube.com/watch?v=mk8tOD0t8M0 Mode, Median, Mean, Range & Standard Deviation]: A good summary
  
[http://intellspot.com/descriptive-statistics-examples/ Descriptive Statistics]: A detailed summary
+
====Articles====
 +
* [https://www.investopedia.com/terms/d/descriptive_statistics.asp Descriptive Statistics]: An introduction
 +
* [http://intellspot.com/descriptive-statistics-examples/ Descriptive Statistics]: A detailed summary
  
 
----
 
----
 
[[Category:Statistics]]
 
[[Category:Statistics]]
 
[[Category:R examples]]
 
[[Category:R examples]]

Latest revision as of 13:24, 8 July 2021

Bildschirmfoto 2020-03-28 um 15.39.37.png

Descriptive stats are what most people think stats are all about. Many people believe that the simple observation of more or less, or the mere calculation of an average value, is what statistics are all about. Of course, this is not the case - statistics is more than descriptive statistics, or whimsical bar plots or even pie charts. Still, knowing the basics is important, and most of you probably already calculated things like mean and median in school. So let us have another look to refresh your memory.

Basics of descriptive statistics

This graphic visualizes what mean, mode and median explain regarding a dataset.

Mean

The mean is the average of numbers you can simply calculate by adding up all the numbers and then divide them by how many numbers there are in total.

Median

The median is the middle number in a sorted set of numbers. It can be substantially different from the mean value, for instance when you have large gaps or cover wide ranges within your data. Therefore, it is more robust against outliers.

Mode

The mode is the value that appears most often. It can be helpful in large datasets or when you have a lot of repetitions within the dataset.

Range

The range is simply the difference between the lowest and the highest value and consequently it can also be calculated like this.

This graph shows how the standard deviation is spread from the mean.

Standard deviation

The standard deviation is calculated as the square root of variance by determining the variation between each data point relative to the mean. It is a measure of how spread out your numbers are. If the data points are further from the mean, there is a higher deviation within the data set. The higher the standard deviation, the more spread out the data.

R examples

Now, let us have a look at how to calculate these values in R.

#descriptive statistics using the Swiss dataset
swiss
swiss_data<-swiss

#we are choosing the column fertility for this example
#let's begin with calculating the mean
mean(swiss_data$Fertility)

#median
median(swiss_data$Fertility)

#range
range(swiss_data$Fertility)

#standard deviation
sd(swiss_data$Fertility)

#summary - includes minimum, maximum, mean, median, 1st & 3rd Quartile
summary(swiss_data$Fertility)


External Links

Videos

Articles