Causality and correlation

From Sustainability Methods

Note: The German version of this entry can be found here: Causality and correlation (German)

Note: This entry focuses on the connection between correlations and causality. More details on Correlations as a scientific method can be found in this entry: Correlations. More details on Causality in cience can be found in the entry on Causality.

Correlative relations

Correlations can tell us whether two variables are related. A correlation does not tell us however what this correlation means. This is important to note, as there are many correlations being calculated, but it is up to us to interpret these relations.

It is potentially within our normative capacity to derive hypotheses, but it can also be powerful to not derive hypotheses and have a purely inductive approach to a correlation. We live in a world of big data, and increasingly so. There is a wealth of information out there, and it is literally growing by the minute. While the Enlightenment and then the Modernity built a world of science based on the power of hypothesis, this science was also limited. We know today that the powerful step of building a hypothesis offers only a part of the picture, and powerful inventions and progress came from induction. Much progress in science was based on inductive approaches. Consider antibiotics, whose discovery was a mere accident.

With the predictive power of correlations and the rise of of machine learning and the associated, much more complex approaches, a new world dawned upon us. Today, predictions are based on simple correlations and the recognition of power patterns in the wealth of data we face. Although much of the actual mathematics are much more complicated, suggestions of prominent online shops on what you may want to buy next are - in principle - sophisticated elaborations of correlations. We do not understand why certain people that buy one thing buy also another thing, but promiting this relation increases sales. Of course the world is more complicated, and once more these models cannot luckily explain everything. I know for myself, that as long as my music service suggests me to listen to Oasis - the worst insult to me - I am safe from the total prediction of the machines. Still, with predictive power once more comes great responsibility, and we shall see how correlations and their predictive power will allow us to derive more theories based on our inductive perspective on data. Much is to be learned through the digitalisation of data, but it is still up to us to interpret correlations. This will be very hard to teach to a machine, hence it is our responsibility to interpret data analysis through reasoning.

The Rise of Correlations

Propelled through the general development of science during the Enlightenment, numbers started piling up. The increasing technological possibilities to measure more and more information are slow to store this information, and people started wondering whether these numbers could lead to something. The increasing numbers had diverse sources, some were from science, such as Astronomy or other branches of natural sciences. Other prominent sources of numbers were from engineering, and even other from economics, such as double bookkeeping. It was thanks to the tandem efforts of Adrien-Marie Legendre and Carl Friedrich Gauss that mathematics offered the first approach to relate one line of data with another - the methods of least squares.

How is one continuous variable related to another? Pandora's box was opened, and questions started to emerge. Economists were the first who utilised regression analysis at a larger scale, relating all sorts of economical and social indicators with each other, building an ever more complex controlling, management and maybe even understanding of statistical relations. A regression implies a causal link between two continuous variables, which makes it different from a correlation, where two variables are related, but not necessarily causally linked. (For more on regressions, please refer to the entry on Regression Analysis. The Gross domestic product (GDP) became for quite some time kind of the favorite toy for many economists, and growth became a core goal of many analyses to inform policy. What people basically did is ask themselves how one variable is related to another variable.

If nutrition of people increases, do they live longer?

There is a positive correlation between nutrition and the life expectancy of people worldwide. Source: gapminder.org


Does a high life expectancy relate to more agricultural land area within a country?

There is no correlation between life expectancy and the national amount of of agricultural land. Source: gapminder.org


Is a higher income related to more CO2 emissions at a country scale?

There is a positive correlation between national wealth and CO2 emissions. Source: gapminder.org

Key elements of correlations

As these relations started coming in, the questions of whether two continuous variables are casually related became a nagging thought. With more and more data being available, correlation became a staple of modern statistics. There are some core questions related to the application of correlations:
1) Are relations between two variables positive or negative, and how strong is the estimate of the relation? Being taller leads to a significant increase in body weight. Being smaller leads to an overall lower gross calorie demand. The strength of this relation - what statisticians call the estimate - is an important measure when evaluating correlations and regressions. (A regression implies a causal link between two continuous variables, which makes it different from a correlation, where two variables are related, but not necessarily causally linked.)

2) Does the relation show a significantly strong effect , or is it rather weak? In other words, can the regression explain a lot of variance of your data, or is the results rather weak regarding its explanatory power? The [correlation coefficient https://online.stat.psu.edu/stat501/lesson/1/1.6] explains how strong or weak the correlation is and if it is positive or negative. It can be between -1 and +1. The relationship of temperature in Celsius and Fahrenheit  for example is pefectly linear, which should not be surprising as we know that Fahrenheit is defined as 32 + 1.8* Celsius. Furthermore we can say that 100% of the variation in temperatures in Fahrenheit is explained by the temperature in Celsius: the correlation coefficient is 1.

3) What does the relation between two variables explain? Correlation can explain a lot of variance for some data, and less variance for other parts of the data. Take the percentage of people working in Agriculture within individual countries. At a low income (<5000 Dollar/year) there is a high variance in between countries: half of the population of the Chad work in agriculture, while in Zimbabwe with a even slightly lower income it is only 10 %. At an income above 15000 Dollar/year, however, there is hardly any variance in the people that work in agriculture: the proportion is always very low. This has reasons, there is probably one or several variables that explain at least partly the high variance within different income segments. Finding such variance that explain partly unexplained variance is a key effort in doing correlation analysis.

Examples for the correlation coefficient. Source: Wikipedia, Kiatdd, CC BY-SA 3.0
This scatter plot displays a moderate negative correlation between the fertility in Switzerland in 1888 and percentage of draftees receiving the highest mark on army examination. The correlation coefficient is about -0.6.
Here you can see a scatter plot which shows a weak positive correlation between the fertility in Switzerland in 1888 and the percentage of males working in agriculture at that time. The correlation coefficient is +0.3.
This plot shows no correlation between the Infant Mortality in Switzerland in 1888 and the percentage of Catholics.

Causality

Where to start, how to end?

Causality is one of the most misused and misunderstood concepts in statistics. All the while, it is at the heart of the fact that all statistics are normative. While many things can be causally linked, many are not. The problem is that we dearly want certain things to be causally linked, while we want other things not to be causally linked. This confusion has many roots, and spans across such untameable arenas such as faith, psychology, culture, social constructs and so on. Causality can be everything that is good about statistics, and it can be equally everything that is wrong about statistics. To put it in other words: it can be everything that is great about us humans, but it can be equally the root cause of everything that is wrong with us.

What is attractive about causality? People search for explanations, and this constant search is probably one of the building blocks of our civilisation. Humans look for reasons to explain phenomena and patterns, often with the goal of prediction. If I understood a causal relation, I may be able to know more about the future, cashing in on being either prepared for this future, or at least being able to react properly.

The problem with causality is that different branches of science as well as different streams of philosophy have different explanations of causality, and there exists an exciting diversity of theories about causality. Let us approach the topic systematically.

The high and the low road of causality

Let's take the first extreme case: the theory that storks bring the babies. Obviously this is not true. Creating a causal link between these two is obviously a mistake. Now lets take the other extreme case, you fall down a flight of stairs, and in the fall break your leg. There is obviously some form of causal link between these two actions, that is falling down the stairs caused you to break your leg. However, this already demands a certain level of abstraction, including the acceptance that it was you who did fall down the stairs, you twisting your leg or hitting a stair with enough force, you not being too weak to withstand the impact etc. There is, hence, a very detailed chain of events happening between you starting to lose your balance, and you breaking your leg. Our mind simplifies this into “because I fell down the stairs, I broke my leg”. Obviously, we do not blame the person who built the stairs, and we do not blame our parents for bringing us into this world, where we then broke our leg. These things are not causally linked.

But, imagine now that the construction worker did not construct the stairs the proper way, and that one stair is slightly higher than the other stairs. We now claim that it is the fault of the construction worker. However, how much higher does this one stair need to be so that we blame not ourselves, but the construction worker? You get the point.

Causality is a construction that is happening in our mind. We create an abstract view of the world, and in this abstract view we come up with a version of reality that is simplified enough to explain, for instance, future events, but it is not too simple, since this would not allow us to explain anything specific or any smaller groups of events.

Correlations can be deceitful. Source: Spurious Correlations

Causality is hence an abstraction that follows Occam's Razor, I propose. And since we all have our own version of Occam's Razor in our head, we often disagree when is comes to causality. I think this is merely related to the fact that everything dissolves under analysis. If we analyse any link between to events then the causal link can also be dissolved, or can become irrelevant. Ultimately, causality is a choice.

Take the example of medical studies where most studies build on a correlative design, testing how, for instance, taking Ibuprofen may help against the common cold. If I have a headache, and I take Ibuprofen, in most cases it may help me. But do I understand how it helps me? I may understand some parts of it, but I do not really understand it on a cellular level. There is again a certain level of abstraction.

What is now most relevant for causality is the mere fact that one thing can be explained by another thing. We do not need to understand all the nitty-gritty details of everything, and I have argued above that this would be ultimately be very hard on us. Instead, we need to understand whether taking one thing away prevents the other thing from happening. If I did not walk down the stairs, I would have not broken my leg.

Ever since Aristotle and his question ”What is its nature?”, we are nagged by the nitty-grittiness of true causality or deep causality. I propose that there is a high road of causality, and a low road of causality. The high road allows us to explain everything on how two things or phenomena are linked. While I reject this high road, many schools of thought consider it to be very relevant. I for myself prefer the low road of causality: May one thing or phenomena be causally linked to another thing or phenomena; if I take one away, will the other not happen? This automatically means that I have to make compromises of how much I understand about the world.

I propose we do not need to understand everything. Our ancestors did not truly understand why walking out in the dark without a torch and a weapon - or better even in a larger group - might result in death in an area with many predators. Just knowing that staying in at night would keep them safe was good enough for them. It is also good enough for me.

Simple and complex Causality

Let us try to understand causal relations step by step. To this end, we may briefly differentiate two types of causal relations.

Statistical correlations imply causality if one variable (A) is actively driven by a variable (B). If B is taken away or changed, A changes as well or becomes non-existent. This relation is among the most abundantly known relation in statistics, but it has certain problems. First and foremost, two variables may be causally linked but the relation may be weak. Zinc may certainly help against the common cold, but it is not guaranteed that Zinc will cure us. It is a weak causal correlation.

Second, a causal relation of variable A and B may interact with a variable C, and further variables D, E, F etc. In this case, many people speak of complex relations. Complex relations can be causal, but they are still complex, and this complexity may confuse people. Lastly, statistical relations may be inflicted by biases, sample size restrictions, and many other challenges statistics face. These challenges are known, increasingly investigated, but often not solvable.

Structure the chaos: Normativity and plausibility

This is probably one of the most famous examples for a universal theory, that can be disproven by one contradictory case. Are all swans white? It may seem trivial, but the black swan is representative for Karl Popper's Falsificationism, an important principle of scientific work.

Building on the thought that our mind wanders to find causal relations, and then having the scientific experiment as a powerful tool, scientists started deriving and revising theories that were based on the experimental setups. Sometimes it was the other way around, as many theories were only later proven by observation or scientific experiments. Having causality explained by scientific theories created a combination that led to physical laws, societal paradigms and psychological models, among many other things.

Plausibility started its reign as a key criterion of modern science. Plausibility basically means that relations can only be causal if the relations are not only probable but also reasonable. Statistics takes care of the probability. But it is the human mind that derives reason out of data, making causality a deeply normative act. Counterfactual theories may later disprove our causality, which is why it was raised that we cannot know any truth, but we can approximate it. Our assumptions may still be falsified later.

So far, so good. It is worth noting that Aristoteles had some interesting metaphysical approaches to causality, as do Buddhists and Hindus. We will ignore these here for the sake of simplicity.

Hume's criteria for Causality

David Hume

It seems obvious, but a necessary condition for causality is temporal order (Neumann 2014, pp.74-78). Temporal causal chains can be defined as relations where an effect has a cause. An event A may directly follow an action B. In other words, A is caused by B. Quite often, we think that we see such causal relations rooted in a temporal chain. The complex debate on vaccinations and autism can be seen as such an example. The opponents of vaccinations think that the autism was caused by vaccination, while medical doctors argue that the onset of autism merely happened at the same time as the vaccinations are being made as part of the necessary protection of our society. The temporal relation is in this case seen as a mere coincidence. Many such temporal relations are hence assumed, but our mind often deceives us, as we want to get fooled. We want order in the chaos, and for many people causality is bliss. History often tries to find causalities, yet as it was once claimed, history is written by the victorious, meaning it is subjective. Having a model that explains your reality is what many search for today and having some sort of a temporal causal chain seems to be one of the cravings of many human minds. Scientific experiments were invented to test such causalities, and human society evolved. Today it seems next to impossible to not know about gravity - in a sense we all know, yes - but the first physical experiments helped us prove the point. Hence did the scientific experiment help us to explore temporal causality, and medical trails can be seen as one of the highest propelled continuation of these.

We may however build on Hume, who wrote in his treatise of human nature the three criteria mentioned below. Paraphrasing his words, causality is contiguous in space and time, the cause is prior to the effect, and there is a constant union between cause and effect. It is worthwhile to consider the other criteria he mentioned.

1) Hume claims that the same cause always produces the same effect. In statistics, this pinpoints at the criterion of reproducibility, which is one of the backbones of the scientific experiment. This may be seen difficult in times of single case studies, but all the while highlights that the value of such studies is clearly relevant, but limited according to this criterion.

2) In addition, if several objects create the same effect, then there must be a uniting criterion among them causing the effect. A good example for this is weights on a balance. Several weights can be added up to have the same -counterbalancing- effect. One would need to pile a high number of feathers to get the same effect as, let’s say, 50 kg of weight (and it would be regrettable for the birds). Again, this is highly relevant for modern science, as looking for united criteria among is a key goal in the amounts of data we often analyse these days.

3) If two objects have a different effect, there must be a reason that explains the difference. This third assumption of Hume is a direct consequence from the second one. What unites factors can be important, but equally important can be what differentiates factors. This is often the reason why we think there is a causal link between two things, when there clearly is not. Imagine some evil person gives you decaf coffee, while you are used to caffeine. The effect might be severe, and it is clear that the caffeine in the coffee that wakes you up differentiates this coffee from the decaffeinated one.

To summarize, causality is a mess in our brains. We are wrong about causality more often than we think, our brain is hardwired to find connections and often gets fooled into assuming causality. Equally, we often want to neglect causality, when it is clearly a fact. We are thus wrong quite often. In case of doubt, stick with Hume and the criteria mentioned above. Relying on these when analysing correlations, ultimately, demands practice.


The author of this entry is Henrik von Wehrden.