The great statistics recap

From Sustainability Methods
Revision as of 20:32, 24 June 2024 by HvW (talk | contribs) (→‎What was missing)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Annotation: This entry concludes the Introductory class on statistics for Bachelor students at Leuphana. It is not an autonomous entry on its own.

The great recap

Within this module, we focused on learning simple statistics. Understanding basic descriptive statistics allows us to calculate averages, sums and many other measures that help us grasp the essentials about certain data. Data formats are essential in order to understand the diverse forms that data can take. We learned that all data is constructed, which becomes most apparent when looking at indicators which can tell some story, yet without deeper knowledge about the construction - i.e., the context of an indicator - it is hard to grasp.

Once you get a hold of the diverse data formats, you can see then how data can represent different distributions. While much quantitative data is normally distributed, there are also exceptions, such as discrete data of phenomena that can be counted that is often showing a skewed distribution. Within frequentists statistics, statistical distributions are key, because these allow form a statistical standpoint to test hypotheses. You assume that data follows a certain distribution, and that is often one important preconditions for the test or model you want to conduct. Whether your data then shows non-random patterns, but whether a hypothesis can actually be accepted or rejected, depends actually more often than not on the p-value. This value is the calculation whether your results are random and follow mere chance, of whether there is a significant pattern that you tested for. This is at its core what frequentist statistics are all about.

The most simple tests can test for counts of groups within two variables (chi-square test), comparisons of two distributions (f-test) and the comparison of the mean values of two samples of a variables (t-test). Other tests avoid the question of statistical distribution by breaking the data into ranks, which are however less often applied (Wilcoxon test). A breakthrough in statistics was the development of the correlation, which allows to test whether two continuous datasets are meaningfully related. If one of the variables increases, the other variable increases as well, which would be a positive correlation. If one variable increases, and the other one decreases, we speak of a negative correlation. The strength of this relation is summarised by the correlation coefficient, which ranges from -1 to 1, and a values furthest from 0 indicates a strong relation, while 0 basically indicates complete randomness in the relation. This is tested again by p-values, where once more a values smaller than 0.05 indicates a non-random relation, which in statistics is called a significant relation.

While correlation opened Pandoras box of statistics, it also raised a great confusion concerning the question whether a relation is causal or not. There are clear criteria that indicate causality, such as similarity in features of phenomena that have the same effect onto a variable. In order to statistically test for causal relations, regressions were developed. Regressions check for relations between variables, but revolve around a logical connection between these variables to allow for causal inferences. In addition, they allow to test not only the relation of one continuous variable in relation to another dependent variable. Instead, several independent variables can be tested, thus allowing to build more complex models and test more advanced hypotheses. Again, the relation is indicated to be significant by the p-value. However, the strength of the model is not measured in a coefficient, but instead in the r-square value, which is the sum of squares of the individual data points distance from the regression line. A regression line is hence the line that represents the regression model, which best explains the relation between the dependent and independent variable(s).

While regressions were originally designed to test for clear hypotheses, these models are today utilised under diverse contexts, even in inductive research, thereby creating tensions when it comes to the interpretation of the model results. A significant regression does not necessarily indicate a causal relation. This is a matter of the normativity of the respective branch within science, and ultimately, also a question of philosophy of science. This is comparable to the analysis of variance (ANOVA), which unleashed the potential to conduct experiments, starting in agricultural research, yet quickly finding its way into psychology, biology, medicine and many other areas in science. The ANOVA allows to compare several groups in terms of their mean values, and even to test for interaction between different independent variables. The strength of the model can be approximated by the amount of explained variance, and the p-value indicates whether the different groups within the independent variables differ overall. One can however also test whether one groups differs from another groups, thus comparing all groups individually by means of a posthoc test (e.g. Tukey).

When designing an ANOVA study, great care needs to be taken to have sufficient samples to allow for a critical interpretation of the results. Subsequently, ANOVA experiments became more complex, combining several independent variables and also allowing to correct for so called random factors, which are elements for which the variance is calculated out of the ANOVA model. This allows for instance to increase the sample size to minimise the effects of the variance in an agricultural experiment which is being conducted on several agricultural fields. In this example, agricultural fields are then included as block factor, which allows to minimise the variance inferred by these replications. Hence, the variance of the agricultural fields is tamed by a higher number of replicates. This led to the ANOVA becoming one of the most relevant methods in statistics, yet recent developments such as the reproducibility crisis in psychology highlight that care needs to be taken to not overplay ones hand. Preregistering hypotheses and more recognition of the limitations of such designs currently pave a path towards a more critical future of statistical designs.

Another development that emerged during the last decades is the conducting of so called real-world experiments, which are often singular case studies with interventions, yet typically less or no control of variables. These approaches are slowly being developed in diverse branches of research, and allow to open a meta-analytical dimension, where a high number of case studies is averaged in terms of the research results. The combination of different studies enables a different perspective, yet currently such approaches are either restricted to rigid clinical trials or to meta-analyses with more variables than cases.

Real-world experiments are thus slowly emerging to bridge experimental rigour with the often perceived messiness of the problems we face and how we engage with them as researchers, knowing that one key answer involving these is the joint learning together with stakeholders. This development may allow us to move one step further in current systems thinking, where still many phenomena we cannot explained are simply labeled as complex. We will have to acknowledge in the future which phenomena we may begin to understand in the future, and which phenomena we may never be able to fully understand. Non-equilibrium theory is an example where unpredictable dynamics can still be approaches by a scientific theory. Chaos theory is another example, where it is clear that we may not be able to grasp the dynamics we investigate in a statistical sense, yet we may be able to label dynamics as chaotic and allow a better understanding of our own limitations. Complexity is somewhat inbetween, leaning partly towards the explainable, yet also having stakes in the unexplainable dynamics we face. Statistics is thus at a crossroad, since we face the limitations of our approaches, and have to become better in taking these into account.

Within statistics, new approches are rapidly emerging, yet to date the dominion of scientific disciplines still haunts our ability to apply the most parsimonious model. Instead, the norms of our respective discipline still override our ability to acknowledge not only our limitations, but also the diverse biases we face as statisticians, scientists and as a people. Civil society is often still puzzled how to make sense of our contributions that originate in statistics, and we have to become better in contextualising statistical results, and translate the consequences of these to other people. To date, there is a huge gap between statistics and ethics, and the 20th century has proven that a perspective restricted to numbers will not suffice, but instead may contribute to our demise. We need to find ways to not only create statistical results, but also face the responsibility of the consequences of such analyses and interpretations. In the future, more people may be able to approximate knowlegde though statistics, and to be equally able to act based on this knowledge in a reasonable sense, bridging societal demands with our capacity for change.


What was missing

Everybody who actively participated in this module now has a glimpse of what statistics is all about. I like to joke that if statistics is like the iceberg that sank the Titanic, then you now have enough ice for a Gin-Tonic, and you should enjoy that. The colleagues I admire for their skills in terms of statistics spent several thousand hours of their life on statistics, some even tens of thousands of hours. By comparison, this module encapsulates about 150 hours, at least according to the overall plan. Therefore, this module focuses on knowledge. It does not include the advanced statistics that demand experience. Questions of models reductions, mixed effect models, multivariate statistics and many other approaches have hardly been touched upon, because this would have been simply too much.

In itself, this whole module is already daring endeavour, and you are very brave that you made it through. We never had such a course when we were students. We learned how to calculate a mean value, or how to make a t-test. That was basically it. Hence this course is designed to be a challenge, but it is also supposed to give you enough of an overview to go on. Deep insights and realisation happen in your head. We gave you a head start, and gave you the tools to go on. Now it is up to you to apply the knowledge you have, to deepen it, transfer it into other contexts and applications, and thus move from knowledge to experience. Repetition and reflection forge true masters. Today, there are still too few people willing to spend enough time on statistics to become truly versatile in this arena of science. If you want to go elsewhere now, fine. You now learned enough to talk to experts in statistics, given that they are willing to talk to you. You gained data literacy. You can build bridges, the problems we face demand that we work in teams, and who knows what the future has in stock for you.

Nevertheless, maybe some of you want to go on, moving from apprenticeship to master level. Statistics is still an exciting, emerging arena, and there is much to be learned. One colleague of mine once said about me that I could basically "smell what a dataset is all about". I dare you to do better. I am sure that the level of expertise, skill and experience I gained is nothing but a stepping stone to deeper knowledge and more understanding, especially between all of us, regarding the interconnectedness of us all. I hope that all of you find a way how you can contribute best, and maybe some of you want to give statistics a try. If so, then the next section is for you.

How to go on

Practise, practise, practise. One path towards gaining experience is by analysing any given dataset I could get my hands on. Granted, in the past this was still possible because the Internet was not overflowing with data, yet still there is surely enough out there to spend time with, and learn from it. The Internet is full of a lot of information on statistics. Not all of it is good, necessarily, yet all is surely worth checking out. After a few hundred hours of doing statistics you will realise that you develop certain instincts. However, in order to get there, I suggest you find some like-minded peers to move and develop together. I had some very patient colleagues/friends who were always there for a fruitful exchange, and we found our way into statistics together. It certainly helped to have a supervisor who was patient and experienced. Yet one of the biggest benefits I had was that I could play with other peoples data. If you become engaged in statistics, people will seek your advise, and will ask you for help when it comes to the analysis of their data. The diversity of datasets was one of the biggest opportunities for learning I had, and this goes on up until today. Having a large diversity of data rooted in the actual experience of other people can be ideal to build experience yourself.

If I can give you one last advice: There is certainly still a shortage of people having experience in statistics, hence it may be this skill that may allow you to contribute to the bigger picture later in life. What I am however most certain about that there is next to no person that has practical experience in statistics and a deeper understanding about ethics. While this may be the road less taken, I personally think it may be one of the most important ones we ever faced. I might be biased, though. In the end, I started this Wiki not only to overcome this bias, but to invite others on the road less taken.


The author of this entry is Henrik von Wehrden.