Difference between revisions of "Main Page"

From Sustainability Methods
Line 18: Line 18:
  
 
=== Day 4 - Correlation and regression ===
 
=== Day 4 - Correlation and regression ===
# What can be correlated?
+
== What are correlation and regressions ==
 +
 
 +
Propelled through the general development of science during the Enlightenment, numbers started piling up. With more technological possibilities to measure more and more information, and slow to store this information, people started wondering whether these numbers could lead to something. The increasing numbers had diverse sources, some were from science, such as Astronomy or other branches of natural science. Other prominent sources of numbers were from engineering, and even other from economics, such as double bookkeeping. It was thanks to the tandem efforts of Adrien-Marie Legendre and Carl Friedrich Gauss that mathematics offered with the methods of least squares the first approach to relate one line of data with another. How is one continuous variable related to another? The box of the Panters was opened, and questions started to emerge. Economists were the first who utilised regression analysis at a larger scale, relating all sorts of economical and social indicators with each other, building an ever more complex controlling, management and maybe even understanding of statistical relations. The Gross domestic product -or GDP- became for quite some time kind of a pet variable for many economists, and especially Growth become a core goal of many analysis to inform policy. What people basically did is ask themselves, how one variable is related to another variable. If nutrition of people increases, do they live longer (Yes). If Economies have a higher GDP do they offer more social security (No). Does a higher income lead to more Co2 emissions at a country scale (yes). As these relations started coming in the questions of whether two continuous variables are casually related becoming a nagging thought. With more and more data being available, correlation became a staple of modern statistics. There are some core questions related to the application of correlations and regressions.
 +
1) Are relations between two variables positive or negative?
 +
Relations between two variables can be positive or negative. Being taller leads to a significant increase in body weight. Being smaller leads to an overall lower gross calorie demand. The strength of this relation -what statisticians call the estimate- is an important measure when evaluating correlations and regressions. Is a relation positive or negative, and how strong is the estimate of the relation?
 +
 
 +
2) Does the relation show a significantly strong effect, or is it rather weak? In other words, can the regression explain a lot of variance of your data, or is the results rather weak regarding its explanatory power? Take EXAMPLE
 +
 
 +
3) Relation can explain a lot of variance for some data, and less variance for other parts of the data. Take the percentage of people working in Agriculture within individual countries. At a low income (<5000 Dollar/year) there is a high variance. Half of the population of the Chad work in agriculture, while in Zimbabwe with a even slightly lower income its 10 %. At an income above 15000 Dollar/year, there is hardly any variance in the people that work in agriculture within a country. The proportion is very low. This has reasons, there is probably one or several variables that explain at least partly the high variance within different income segments. Finding such variance that explain partly unexplained variance is a key effort in doing correlation analysis.
 +
 
 +
Causal vs non-causal relations
 +
 
 
#:- See also, [[Misunderstood concepts in statistics#Correlation|misunderstood concepts]]
 
#:- See also, [[Misunderstood concepts in statistics#Correlation|misunderstood concepts]]
 
# Are all correlations causal?
 
# Are all correlations causal?

Revision as of 11:46, 16 September 2019

Welcome to Sustainability Methods!

Day 1 - Intro

  1. Do models and statistics matter? Why does it pay to be literate in statistics and R?
  2. Getting concepts clear: Generalisation, Sample, and Bias
    - See also, misunderstood concepts
  3. History of statistics

Day 2 - Data formats based on R

  1. Continuous vs. categorical, and subsets
  2. Normal distribution
  3. Poisson, binomial, Pareto

Day 3 - Simple tests

  1. Parametric and non-parametric
  2. Hypothesis testing
  3. The power of probability

Day 4 - Correlation and regression

What are correlation and regressions

Propelled through the general development of science during the Enlightenment, numbers started piling up. With more technological possibilities to measure more and more information, and slow to store this information, people started wondering whether these numbers could lead to something. The increasing numbers had diverse sources, some were from science, such as Astronomy or other branches of natural science. Other prominent sources of numbers were from engineering, and even other from economics, such as double bookkeeping. It was thanks to the tandem efforts of Adrien-Marie Legendre and Carl Friedrich Gauss that mathematics offered with the methods of least squares the first approach to relate one line of data with another. How is one continuous variable related to another? The box of the Panters was opened, and questions started to emerge. Economists were the first who utilised regression analysis at a larger scale, relating all sorts of economical and social indicators with each other, building an ever more complex controlling, management and maybe even understanding of statistical relations. The Gross domestic product -or GDP- became for quite some time kind of a pet variable for many economists, and especially Growth become a core goal of many analysis to inform policy. What people basically did is ask themselves, how one variable is related to another variable. If nutrition of people increases, do they live longer (Yes). If Economies have a higher GDP do they offer more social security (No). Does a higher income lead to more Co2 emissions at a country scale (yes). As these relations started coming in the questions of whether two continuous variables are casually related becoming a nagging thought. With more and more data being available, correlation became a staple of modern statistics. There are some core questions related to the application of correlations and regressions. 1) Are relations between two variables positive or negative? Relations between two variables can be positive or negative. Being taller leads to a significant increase in body weight. Being smaller leads to an overall lower gross calorie demand. The strength of this relation -what statisticians call the estimate- is an important measure when evaluating correlations and regressions. Is a relation positive or negative, and how strong is the estimate of the relation?

2) Does the relation show a significantly strong effect, or is it rather weak? In other words, can the regression explain a lot of variance of your data, or is the results rather weak regarding its explanatory power? Take EXAMPLE

3) Relation can explain a lot of variance for some data, and less variance for other parts of the data. Take the percentage of people working in Agriculture within individual countries. At a low income (<5000 Dollar/year) there is a high variance. Half of the population of the Chad work in agriculture, while in Zimbabwe with a even slightly lower income its 10 %. At an income above 15000 Dollar/year, there is hardly any variance in the people that work in agriculture within a country. The proportion is very low. This has reasons, there is probably one or several variables that explain at least partly the high variance within different income segments. Finding such variance that explain partly unexplained variance is a key effort in doing correlation analysis.

Causal vs non-causal relations

  1. - See also, misunderstood concepts
  2. Are all correlations causal?
  3. Is the world linear?
  4. Transformation

Day 5 - Correlation and regression

  1. P values vs. sample size
  2. Residuals
  3. Reading correlation plots

Day 6 - Designing studies Pt. 1

  1. How do I compare more than two groups?
  2. Designing experiments - degrees of freedom
  3. One way and two way

Day 7 - Designing studies Pt. 2

  1. Balanced vs. unbalanced - Welcome to the Jungle
  2. Block effects
  3. Interaction and reduction

Day 8 - Types of experiments

  1. Are all laboratory experiment really made in labs?
  2. Are all field experiment really made in fields?
  3. What are natural experiments?

Day 9 - Statistics from the Faculty

Day 10 - Statistics down the road

  1. Multivariate Statistics
  2. AIC

Day 11 - The big recap

  1. Distribution & simple test
  2. Correlation and regression
    - See also, misunderstood concepts
  3. Analysis of Variance

Day 12 - Models

  1. Are models wrong?
  2. Are models causal?
  3. Are models useful?

Day 13 - Ethics and norms of statistics

  1. What is informed consent?
  2. How does a board of ethics work?
  3. How long do you store data?


View All Pages.

Admin Tools