The future of statistics?
In short: This entry revolves around the future of statistics. Which issues need to be solved, and which role can and should statistical analyses play in science and society in the next years and decades? For more on statistics, please refer to this overview page.
- 1 What will be the future of statistics?
- 2 The future of statistics
- 3 The future of statistics within science
- 4 The future contribution of statistics to society
- 5 Epilogue
What will be the future of statistics?
Quite some people apply statistics these days, and it has become a staple of many diverse branches within science. However, critical perspectives increased as well over the last decades, and the computer revolution and the vast increase of peer-reviewed publications triggered a diversification in statistics. Sadly, the critical perspective is often a more of an add-on or afterthought, or settles into universal rejection, but has hardly triggered a deeper critical reflection or even revolution within statistics and the people who apply it regularly. Because of this, the benefits of these critical perspectives are often restricted to some few branches of science, or to a level of theory of science, yet do not translate into the empirical knowledge production where statistics is still so pivotal. The increasing diversity of statistics as such has also increased the mess, as it led to a increasing loss of linguistic coherence in the diverse disciplines, and led to developments where knowledge production or the testing of theories is relying on a small canon of statistical methods.
Statistics hence added to the ever-spiraling specialisation of the scientific disciplines, and contributed to the demise of the old kingdoms. This is reflected in the role that statistics have played in the 20th century and even before, where the analytical view on numbers was a willing accomplice to the injustices and inequalities of human thriving. The current culture wars are a reflection of these same unsolved problems where statistics contributed to, yet there is hope for the future. Statistics may be able to contribute to a better world for all people, but we need to overcome several struggles that are within the branch of statistics itself, within science, and in the contribution that statistics through science can have towards society.
In this text, I will propose my current view on how statistics may evolve in the future, how statistics may grow up into something better, and how it shall overcome the haunting problems it faces today. This text will thus first look at the future developments within statistics, trying to anticipate what could happen to evolve the arena of statistics. The second section will critically examine the role of statistics across disciplines, and which problems may be solved in the future. The last section will paint a picture of the future contributions of an evolved arena of statistics and its role in society. It is in the nature of this endeavour that this text is not only offering a highly subjective vision, but also a bold one. However, since next to no one dares to dream about the future of statistics, I shall give it a try.
The future of statistics
Plausible validation criteria
The question I get asked the most as a statistician is: How much does the model explain, and is that good or bad? Of course this is a loaded question that cannot be answered. Imagine the following example. You are terminally ill, and the doctors told you to settle your affairs. Suddenly a cure for your condition is discovered, but it only cures 10% of all patients. Do you take the shot? Sure you will. Now imagine you are in a different setting. You are perfectly healthy while planning your travels to the town of Samarra. Suddenly, Death knocks on your door, and tells you that you have a 10% chance of dying if you travel to Samarra. Would you follow through with your travel plans? Most certainly not. Yet one time you have a slim chance of survival, and the other time you have a slim chance of dying, which is also at 10 %. This highlights that the validation criterion of probability is context-dependent. Thus, instead of focussing on absolute criteria, relative context-dependent criteria seem more helpful. While Bayesian statistics would in theory bring us much closer to such a relative measure, it still relies on validation criteria familiar from frequentist statistics. Still, Bayes theorem may pose an answer for a relative validation criteria. Unfortunately, this would demand everybody to understand Bayesian statistics, which is where we may talk about the rather distant future.
In statistics, today, there is a tendency to use more complex models as a sells pitch, as well as too simple models due to lack of experience. In the long run however, knowledge and experience may spread, and people may use the most appropriate models. I think a starting point for this will be pursuing different models at the same time, not in an ensemble way where the model results are put together and averaged, but in a reflexive way. Imagine I see a weather development that may have ramifications for the nomads in an area, being a foreshadow of a drought that may affect their livestock. I could not use simple linear models, which would not help me to grapple such an extreme event, but would allow me to see some long-term trend statistics and get some explanation of what might happen in terms of the big picture. Then I might focus with a finer grain on the actual weather data right now, comparing climate and weather in a nested way, using non-linear statistics to focus on the actual prediction. Next, I could ask myself what happens if I take a fresh look, using a Bayesian approach to allow myself to only grapple the current data with real time updates. All these models unravel different aspects of the same phenomena, yet today -more often than not - statistics decide for one way of analysis, or combine all ways (e.g. ensemble models) as an average. Yet I propose that much knowledge is like an onion, and different statistical models can help us to peel of the different layers.
Examine plausible questions
Much of the knowledge that science produces is driven by scientific demand, and this can be a good thing. However, our growth-driven economy shows the catastrophic shortcoming of a line of thinking that focuses on one aspect of knowledge, all the while actively rejecting or denying other parts of knowledge or reality. Empirical knowledge depends on the questions we ask. A manager of an organisation that attempts to maximise their profits may or may not ask the right questions for their goal, as focussing on growth may be the main reason for the organisation's demise or prosperity in the future. To the manager, focusing on growth seems plausible, so they pursue it. If it is reasonable, they might misjudge. Plausibility is closely linked by definition with people being reasonable, and plausible questions have some probability of being worthwhile our investigation. Following Larry Laudan, however, it is clear that scientists of the past were not always reasonable. Larry Laudan thought that many events and discoveries in the history of science were not as rational or reasonable as we think these scientific discoveries were. Future changes such as a more tightly-knit exchange culture in science or more pronounced ethical checks of research proposals may foster more plausible research questions, but this may not the right way to more plausible research questions. Instead, society may shift in the future, and people may just not come to the conclusion to investigate questions that are implausible. This may sound like a very distant future, but if we compare us today to people or even researchers 100 years ago or even 500 year ago, it become clear that we already came a long way, and shall go on.
Integrate more & diverse data
Statistics was born out of the urge to understand and control system dynamics, partly with a purely non-materialistic focus such as in Astronomy - a motor of early statistics - or with a total focus on materialistic gain, as was the case in early econometrics. With the rise of the Internet, our inflict of data increased at an exponential pace - following Moore's law - triggering both positive but also devastating effect on the global dynamics we witness today. Access to information has become on of the most important privilege people have today, and much has been gained, but much was also lost in translation. Examples such as information spreading from activists showcase what can be gained, yet terrorists and insurgents are coordinating themselves equally in the digital age. Data security is stronger shifting into the focus of societal debate, but we are far away from real data security, and in many aspects seemingly only a step away from a Black Mirror episode. All the while is is clear that we have no idea of which types of data may be available in the future. Movement patterns of elderly risk patients can predict a risk of a heart attack weeks before it would actually happen. Research can trigger a tilt towards fairer income distribution, or support the notion of a universal basic income. Detection systems of weather, oceans and the upper crust of Earth may prevent countless losses of life in in case of disasters. However, all these examples are still connected to our reality today, and we have to assume that future people will utilise data in ways that will be hard to imagine for us today.
Complexity is almost like a holy grail in terms of current debates about system dynamics, because many use complexity either as an argument why change might happen, or to underline our limitations in understanding system dynamics. I believe that both lines of thinking are capital mistakes, as they may lead to false argumentations or to inaction in investigating system dynamics. What is complex for us may be simple to future people. Also, while there will most likely always be dynamics or patterns that we do not understand, in the future we may have mapped out which dynamics can be understood, and which dynamics follow other principles such as chaos theory, and can thus not be predicted. Complexity is not an ontological truth, but a buzzword that frames our current limitations. Once we have mapped out what we cannot understand, we will be much better able to act within the remaining rest of reality. Many people frame complexity and knowledge about it right now as a privilege, which they have access to, but other people do not. I can declare something to be complex, and thus mark this pattern to be not understandable. A good example are weather dynamics, which beyond a certain time window become unpredictable. While our current thinking dictates us that this will always be the case, we are different people today than we were once more 100 years ago. We have weather forecasts, ensemble models, satellites in orbit, and many means to at least predict the weather for the next 1-2 weeks, which would have been unthinkable at a planetary scale some centuries ago.
The gap between causal links and predictive pattern detection is one of the largest missed opportunity of statistics to establish a proper link to philosophy. While many philosophers such as Hume explored matters of causality deeply, this is still widely ignored in large parts of the everyday life of research. The unlocking of machine learning and associated analytics even led to an outright rejection of any causal link, focussing on prediction instead. This can be a good thing, as a life saved through a good prediction is surely better than a life being lost, yet a post mortem helps us to understand why the person died. Anyone would surely prefer a life based on a blur prediction, compared to an explainable death. Mapping out this difference is of pivotal importance, because prediction can inform us so that we may decide how to act, yet causal explanation can help us understand why we should act. Causality is thus key to decisions and intentional acts. Without causality, we may be doomed to act based on our external senses alone, or on a prediction of some machine, yet it is our reflection based on causal information that may give our actions meaning and reason. It is part of human nature that some of our actions may always be not explainable, and it is part of statistics that some patterns and mechanisms will never be explainable. Still, a proper embedding of causal information and how we define may give us reasons to act, and not only our actions matter, as well as reason alone does not matter, but the combination of acting reasonable based on causal information may matter deeply.
The future of statistics within science
Solve theory of science
Untangling the crisis of Western science for good is like the moonshot on this list. The perceived ghost of positivism that is still haunting us to this day, and the counter-revolutionary movements that emerged out of it triggered a division that left much of current research still being stuck in an illusion of objectivism, while some are lost in their maze of universal rejection or critical reflection. Critical realism with its subjective view of scientific knowledge, and its possibility for ontological truths still being out there, may have solved the current dilemmas of theory of science. Unfortunately, most researcher are not aware of this, or reject or ignore it. The errors of the past, and the biases these errors create inside of us as individuals as well as within the scientific community deserve a critical perspective on science. Equally, we need to create knowledge to continue the path of this human civilization, since we unleashed many wicked problems that need to be solved. Otherwise all may be in vain, and science needs to acknowledge that. Statistics is probably one of the branches of science that is furthest away from critical realism, yet if we change our education systems to enable a reflexive humanism as a baseline for our education, I cannot see why critical realism should not spread, and ultimately prevail. From a current viewpoint, it looks like our best ticket to the moon, and beyond.
Establish postdisciplinary freedom
Scientific disciplines are a testimony of the oppressive evolving of science out of the Enlightenment, leading to silos of knowledge that we call scientific discipline. While it is clear that this minimises the chances of a more holistic knowledge production, scientific disciplines are still necessary from a perspective of depth of knowledge. Medicine is a good example where most researchers are highly specialised, because there is hardly any other way to contribute to the continuous evolution of knowledge. We may thus conclude that focus in itself is necessary, and often helpful. There are however also other factors about the existence of scientific disciplines that are important to raise. First of all, scientific disciplines are in a fight about priorities of knowledge and interpretation. Many disciplines claim that their knowledge is indeed of a higher value than the knowledge of the other discipline. It is clear that this notion needs to be rejected once we take a step back and look at the whole picture, since such claims of superiority do not make any sense. Yet from a perspective of critical realism, one could claim that ethics and maybe even philosophy are on a different level, because the can transcend epistemological perspective, and may even create ontological truths. While other disciplines thus vanish in the future, philosophy, and more importantly ethics, are about our responsibility as researchers, and may thus play a pivotal role. I would propose that statistics could contribute to this end, because statistics is at its heart not disciplinary. Instead, statistics could provide a reflexive link between different domains of knowledge, despite it being almost in an opposite position today, since statistics is often the methodological dogma of many scientific disciplines.
Clarify the role of theory
Statistics today is stuck between a rock and a hard place. Statistics can help to test hypotheses, leading to a accepting or rejection of our questions that are rooted in our theories, making deductive research often rigid and incremental. At the extreme opposite end, there is the inductive line of thinking, which claims an open mind independent of theory, yet often still looks at the world through the lens of a theoretical foundation. Science builds on theory, yet the same theories can also lock us into a partial view of the world. This is not necessarily bad, yet the divide between inductive and deductive approaches has been haunting statistics just as many other branches of science. Some approaches in statistics almost entirely depend on deductive thinking, such as the ANOVA. Other approaches such as cluster analysis are widely inductive. However, all these different analyses can be used both in inductive and deductive fashion, and indeed they are. No wonder that statistics created great confusion. The ANOVA for example was a breakthrough in psychological research, yet the failure to reproduce many experiments highlights the limitations of the theories that are being pursued. Equal challenges can be raised for ecology, economy, and many other empirical branches of science. Only when we understand that our diverse theories offer mere partial explanations, shall these theories be settled in their proper places.
Reduce and reflect bias
Bias has been haunting research from the very beginning, because all humans are biased. Statistics has learned to quantify or even overcome some biases, for instance the one related to sampling or analysis are increasingly tamed. However, there are many more biases out there, and to this day most branches of science only had a rather singular focus on biases. In the future we may pool our knowledge and build on wider experience, and may learn to better reflect our biases, and through transparency and open communication, we may thus reduce them. It seems more than unclear how we will do this, but much is to be gained.
Allow for comparability
How can we compare different dimensions of knowledge? To give an example, how much worth in coin is courage? Or my future happiness? Can such things be compared, and evaluated? Derek Parfit wrote that we are irrational in the way how we value the distant future less as compared to the presence, even if we take the likelihood of this distant future becoming a reality into account. This phenomenon is called temporal discounting. Humans are strangely incapable of such comparisons, yet statistics have opened a door into a comparability that allows to unravel a new understanding of the comparisons in our head with other comparisons, or in other words, to contextualise our perspectives. Temporal discounting is already today playing less of a role because of teleconnections and global market chains. What would however be more important, is if people gained - through statistics - a deeper insight into their existence compared to everybody else. Such a radical contextualisation of ourselves would surely change our perspective on our role in the world.
Evolve information theory
While frequentist statistics evolve around probability, there are other ways to calculate the value of models. Information theory is - in a nutshell - already focusing on diverse approaches to evaluate information gained through statistical analysis. The shortcoming of p-values have been increasingly moved in the focus during the last one or two decades, yet we are far away from alternative approaches (e.g. AIC) being either established or accepted. Instead, statistics are scattered when it comes to analysis pathways, and model reduction is currently at least in the everyday application of statistics still closer to philosophy than our way of conducting statistics. The 20th century was somewhat reigned by numbers, and probability was what more often than not helped to evaluate the numbers. New approaches are emerging, and probability and other measures may be part of the curriculum of high school students in the future, leaving the more advanced stuff that we have no idea about today to higher education.
The future contribution of statistics to society
Generate societal education
It is highly likely to assume that even advanced statistics may become part of the education of young schoolchildren. After all, today's curriculum is vastly different from what was taught 100 years ago. Statistics could be one stepping stone towards a world with a higher level of reflection, where more and more people can make up their own mind based on the data, and can take reflected decisions based on the information available. Inequalities can only be diminished if they are visible, and statistics are one viewpoint that can help to this end, not as a universal answer, but as part of the picture. The COVID-19 pandemic has shown how the demand for data, patterns and mechanism went through the roof, and more people got into the data and analysis, and acted accordingly - given that they had the capability. The greatest opportunity of a more dispersed statistical education is surely within other cultures. While Europe and North America are widely governed by knowledge gained from statistics, much could be learned ensuring these approaches with different knowledge from other cultures. We only started to unravel the diversities of knowledge that is out there. Statistics may also be a staple in the future, yet knowledge become more exciting if it is combined with other knowledge.
In order to become able to meaningfully contribute to a societal as well as cross-cultural dialogue and joined learning, statistics would need to learn better ways to tell stories. Ultimately, it is joint stories that create identity and unite people. We should not be foolish and consider a so-called "objective" worldview propelled by statistics to be a meaningful goal, because this would surely deprive us of many dimensions of cultural diversity. Instead, statistics needs to emerge into an arena that can tell stories through data, and engage people, and thus help to create identities that are reflexive, immersive and ultimately aimed at understanding different perspectives. If statistics became less about privileged knowledge and instead take the understanding of all people as a baseline, we would have an ambitious yet tangible long-term goal.
Qualify statistical results
Right now, the general understanding of statistics is vastly different. Few people actually understand statistics, and often this is the small privileged group being able to get a higher education. If statistics is presented, then most people do not understand it, or understand only parts. Gaining a broader understanding about the different valuations of statistical results is hardly ever done in a societal debate, hence citizens rely on the translation and condensation by the media. In the future, qualifying statements that allow for a contextual understanding of statistics shall be commonplace, and more efforts need to go into building a broader understanding what this actually means. Selective presentation of statistics will be a mistake of the past.
End information wars
Sadly, the selective presentation of statistics is part of our current reality. The media, politicians and other institutions present statistics often in a partial form to literally sell their view of the world. More often than not, a look at the whole data or other sources reveals a different picture - not just in nuances, but altogether different and sometimes even reversed. Much of the information that is ultimately presented publicly is designed to be palatable and to divide people. While there is surely a lot of good journalism already out there, much information is wrong, and political wars are waged about selling information, and how to utilise the partial realities for a political agenda. The Internet, which was originally set to realise the vision of global knowledge exchange, became a breeding ground for the great divide we currently face. We need to return to the vision of a free information flow, and educate people, and allow them to form their own picture, maybe even without the media as mediators.
Overcome resource fights
Much of the current conflicts are about resources, albeit often indirectly. Statistics enables or fuels these conflicts, as it allows for comparisons that ultimately have a utilitarianism purpose, often for an in-group, thereby excluding other people that are designated as out-groups. Hence statistics are - besides cultural identities - one of the origin points of current conflicts. This may change in the future. Resources may become less sparse, i.e. through a different harnessing of energy. If we were not limited by energy constraints, many conflicts might be overcome. Equally, future societies may avoid singular aims of materialistic values, consumption and growth-driven economies. This would tilt the way statistics is embedded into society, yet would certainly make it not less important. Instead, statistics would shift to brighten our knowledge, instead of fueling inequalities.
Link people to analysis
Fundamental for overcoming inequalities between different people would be a different mode of transparency when it comes to data security, as well as the way people demand data, depend on it, and ultimately utilise it. Power to the people is from this viewpoint not enough: we also need to give data and means of analysis to the people. Consequently, people also need to raise the question which data they need to reflect and take decisions. Which data is being created is right now decided by a small elite of people, thereby excluding the vast majority of people. This could be changed already now, since more data is being created by the people thanks to smartphones, the Internet and other technology. This data should however not be used to manipulate people, but to empower them. Incredible amounts of data can be expected in the future, and science needs to work with citizens to focus on the data that is most needed to overcome the problems we may still face, and the solutions we need.
Much of the bad reputation within statistics is because it is hardly explained what statistics cannot do. Today, much emphasis in the media is put on what statistics can do, and remarkably often, the media does not get the point. To this end, limitations are not about individual statistics and results, but are about statistics in general. We know for instance that some statistical tests and models reveal different results than other models. All models are wrong, some models are useful. Society needs to understand and learn the value of statistics, but also the limitations. Otherwise, populistic leaders will continue to devalue statistics with clever selective criticism rooted in the fear of some voters. It is in the nature of science that scientific results change, and nothing else can be expected from statistics.
Integrate data continuity
Changing statistical results may be one of the greatest changes we shall see in the future. Up until the recent past, much of statistical results that are presented are temporal slices. However, there is an increasing temporal continuity being merged by dashboards in the Internet, leading to ever-changing results, which often gain in precision. Constant updates in statistical data and analysis will in the future enable us better and better to react to changes and anticipate challenges. Through an integrated data continuity, we will finally come to a view that does justice to the world, with little more fix points beside our cultural identity and our moral principles. Anything beyond this may be a constant fluctuation of data, which is nothing but a testimony of the interconnectedness of all people. This will not only enable new information age, but ultimately lead to a different dominion of insights, where we leave much of the static worldview that still closes in on us, and even traps us, today. Instead, we will become relative dots of information in space and time, where the interconnections between us are what ultimately matters, and we are able to see and feel this more than ever before.
Much has been gained, yet some say that much is lost. Derek Parfit claimed that the future could be wonderful, and I would agree. We may - to follow Dereks line of thinking- not be able to imagine how future humans might be, act and think. However, it is hard to imagine a future of humanity where numbers do not matter at all. For all I care, they may matter less - proportionally - but will still be important.
To me, there is another aspect to it. Statistics could even contribute to make us more free. If you make hundreds of models that all show you patterns and mechanisms of diverse data sets, yet also show you that there are always limitations, and there will always be unexplained variance, this may even be able tilt your worldview. Seeing order in the chaos, finding patterns, unlocking mechanisms, and ultimately also mapping the limitations of our knowledge may be a step from knowledge to experience that can change us as a people, at least this is my naive hope. Many aspects I raised here seem to be unrealistic, or at least far-fetched. To this end, I would always consider how we were 100 years ago. What did we know about statistics? What about reason? What was causal knowledge back then? And how was the situation some 500 years ago? Or 2000 years ago? We have changed dramatically over the past decades and centuries, and out trajectory of change is on an exponential path right now. Who knows how it will go on, but I will always settle for hope. I am all with Martha Nussbaum on this one: “Hope really is a choice, and a practical habit.”
The author of this entry is Henrik von Wehrden.