Advanced Statistical Designs

From Sustainability Methods

In a nutshell: Advanced statistical designs demand experience, yet some main points relevant for these approaches are summarized here.

Over the last decades, statistics established a diverse composite of state of the art approaches that form the norms of the sampling and analyses of advanced statistical designs. These sophisticated and established statistical approaches are prominent throughout numerous branches of science, including psychology, ecology, social science, economics and well beyond. While there is a methodological plurality that is continuously increasing, a form of statistical designs established itself and became a baseline in how scientific experiments are conducted and complicated data is being formalized and analyzed. Within this text, we give a brief introduction into realms of statistics that are deeply rooted in diverse experience. There are after all diverse developments within different disciplines, and despite some wider agreement this form of analysis is still evolving and far from solidified. Consequently we present a normative perspective, and restrict our contribution therefore to the most vital and robust aspects. All written here is not only highly relevant but also presenting a baseline that will be established in the future, and ideally in the near future.

History of advanced designs

The type of advanced statistical designs in the focus here emerged out of more simpler experiments that started more than a century ago. With the rise of the analysis of variance (ANOVA) a statistical method became available that allowed us to conduct scientific experiments outside of the laboratory. Using replicates enabled both the taming and a better understanding of variance. Rising from agricultural experiments focussing on optimisation, these approaches swiftly got adapted in medicine, psychology and biology, and allowed for an ever increasing complexity of experiments to be derived. It was initially in these simple design settings that a firm set of procedures became the norm. Only with the rise of so-called mixed effect models (see below) did a more sophisticated statistical modeling approach become available that allowed to derive statistical results through more diverse predictor sets. Other statistical approaches such as ordinations allowed for dimension reduction of multivariate data, allowing redundancies to be taken into account. Mixed effect models spread throughout all quantitative science, and substantial experience was gained in terms of designing advanced studies. Once mixed effect models became established as a state of the art throughout many scientific disciplines, sample sizes progressively increased. Statistics often overplayed its hand, testing for 5-way interactions that are hard to explain let alone to tame. Yet the biggest advancement was the availability of these models which greatly increased due to the spread of personal computers. The rise of modern computer hardware created a co-evolution of diverse statistical software solutions, triggering an exponential growth not only in applications of statistical models, but also in sample size. Humble beginnings with barely enough samples to allow for significance were replaced in the digital age by ever growing sample sizes, and not only did clinical research but also clinical practice build its reputation on the evidence generated by modern statistics. Yet with great statistical power comes great responsibility. Things took a turn when it became clear that statistics had partly overplayed its hand, and it would be medicine and psychology to lead the way into a phenomenon that is often referred to as the reproducibility crisis. Despite the best efforts of diverse sciences, it became clear that the design of statistical experiments and the limitations of the derived results were intertwined well beyond our understanding. The early 21st century thus saw a backlash against statistics. p-hacking, that is the deliberate modeling until something - anything - significant is fished out of the data was subsequently banned. Studies become increasingly pre-registered to prevent scientists from tinkering with data until something publishable was derived. Instead, hypotheses were formulated and solidified until proven or rejected. Norms on how to deal with other problems such as multicollinearity or redundancy emerged. This summarizes a notorious problem, as it depicts variables having alarming similarity in their explanatory power. Autocorrelation became increasingly recognised, meaning that spatial or other proximities in data structures may infer a bias or create ripples that create wrong understanding of patterns or effects. What is more, algorithms emerged that can be attributed under the umbrella term "machine learning". These chameleons among the models lend themselves to both inductive and deductive approaches, although often promising more on the deductive end then they can hold. Advanced statistical designs are thus a counterpoint still needed within modern statistics and analyses, and despite all the big data that seems to mushroom everywhere, there is a time for strict and rigid deduction within clinical trials, if not beyond. Some questions within science will always demand rigorous testing, and the next decades will show how AI and its chameleon children of the realms of machine learning will amend the demand for clear and planned yes or no answers.

What characterizes advanced statistical designs?

All deduction starts with a clear taming of the already known and the parameters that frame this knowledge. Advanced statistical designs stand like no other knowledge in science on the shoulders of giants. Clear knowledge of the question that needs answering or the gap that exists within a branch of science is the first and foremost precondition in order to create a proper study design. The methodological canon of such approaches is usually rigorous and tested up to a point that seems almost dogmatic, for the simple reason that all errors and biases are anticipated as well as possible in an attempt to minimize what cannot be explained. The simplest form of any advanced deductive statistical model ist that something is explained by something else. May it be y~x or dependent vs independent; the supposably causal inference between two variables is the smallest common denominator in any deductive model. The main components to consider within advanced statistical design are 1) a constructed sample design, 2) taking a parsimonious set of predictors into account, 3) consider relevant influencing factors that are taken into account as random factor or variables and lastly 4) making sure that the sample is large enough, does not violate ethical considerations and is representative of the patterns and processes that are being investigated. In the following, let us go through these components point by point. A whole book would not suffice to explain all details and caveats that come with experience, yet this entry may serve as a modest start.

1) Sample design

There are well established procedures in place that allow for a sophisticated sample design. Disciplines such as medicine, psychology and ecology are deeply built around deductive approaches, and almost all publications utilizing such sample designs build on previous experiences and open questions that arise out of the current state of the art. Knowledge of previous effect strengths and caveats in already existing literature are a pivotal starting point to create an advanced statistical design. Nowadays, we can make distinctions between two main schools of thinking, a) purely designed sample approaches, and b) opportunistic sample approaches that build on already existing data.

a) The first resolves widely around degrees of freedom, meaning a well outlined calculation of how many factor levels and other determinants to evaluate variables are needed and how they can be meaningfully combined. In addition, often a power analysis is performed, which estimates how large the sample needs to be to come to meaningful results. This has established itself over the last years and has become a baseline for instance in psychology. Yet this is often built around p-values that can be criticized for many reasons, and even a large sample may show significant but weak effects. Hence it is necessary to remember that we are looking at a version of reality, and what might be a sample that explains statistically a lot but is still meaningless in terms of its reasonable outcome may differ from a study with a small explanatory effect that is highly meaningful. Terminally ill patients may take even the smallest effect if it transports hope, while the strong and negative effect of smoking is long known - and yet have people stopped smoking? Statistical power and value of meaning are connected, but they are not the same. This thought is best considered before creating a sophisticated statistical design, and consultation of already existing literature may be essential to anticipate what one might expect.

b) The second form of design are sampling strategies that widely utilize or at least mostly build on already existing data. This line of data is prominently known from meta-analysis, which are a form of specific designs that average effects from a diversity of studies or cases. Other examples emerge out of the growing formation of large datasets that compile already existing and often published data into larger collections of diverse data. This is currently an opportunistic explosion of analysis patterns, and a battleground of algorithms and statistical analyses. While more and more shifts towards black box approaches, this is a decisive point in time since in the next 1-2 decades it shall be decided if human operators still have a say and understanding in pattern generation, or whether AI and machine learning shall take this over entirely. Arguments of causality may prevail after all, and tame the difference between a sublime, but causally unknown model fit, and a causal understanding. Here lies the generation of sophisticated statistical designs. It may ultimately fall to key disciplines to decide what is best in any specific case. Yet advanced statistical designs may prosper in times of data continuously if scientists decide that model fit is not all that is needed, but instead a deep and especially planned understanding is what helps us not to predict, but to explain.

2) A parsimonious set of predictors

The second component of advanced statistical designs is the generation and selection of a parsimonious set of predictors. This is where the existing literature is again vital to help us understand what we actually want to test our dependent variable against. Statistically, factor levels are more static, yet there is a clear room for such qualitative measures, and continuous quantitative predictors demand much larger samples, since they are linked to regressions. Beside these two general differences in the data format the predictor set does demand a clear recognition of independence. Predictor variables should not show artifacts of the dependent variables that are to be explained, which is a common rookie mistake. Equally, all predictor variables need to be independent of each other, meaning that they do not explain the same variance. In opportunistic designs, building on existing data, this is a frequent problem, as many predictors are somewhat intertwined. There are many measures to check for such phenomena summarized as multicollinearity, which is defined as the relation between predictor variables. The easiest tool to check this for quantitative variables are correlations.To the same end, a more open measure are ordinations and a univariate measure is the variance inflation factor that can check for multicollinearity, and some statisticians avoid these problems by filtering the data through ordinations as a means of dimension reduction. Yet also in the sophisticated design of clinical trials, such preconditions have been violated often, and only later knowledge may allow to unravel how predictor variables are causally linked. Here, we shall just conclude that independence needs to be initially seen in a statistical sense. Yet independence is not only relevant for the predictor variables, but these need to also show clear independence from random variables, which leads us to the third component of advanced statistical designs.

3) Consider influencing factors

Almost all advanced statistical designs contain random factors, either as a means to increase the sample size as replicates, or as a measure to reduce the impact of a certain pattern or process that is explicitly calculated out of the variance of the analysis. This may be the case for instance in a nested sampled design, where for example hospitals are located in specific countries yet several hospitals are included per country. Due to many - mainly social and economic - reasons many hospitals differ in the effects strengths between countries, and in order to deal with the complexity of factors that are at play, countries are introduced as random factors into the design. Another prominent example are different agricultural fields within ecological experiments. In psychology and medicine the art of declaring variables as random factors thrived highest, and there are many extremely sophisticated designs that are rooted in decades of scientific research. Hence random factors are often a question of topical experience and norms, as many of such random factors are the same among dozens if not hundreds of studies. Certain norms became established, underlining the reign of advanced statistical designs. Another example of the sophistication of such approaches are longitudinal studies that take time of the specific measurements of dependent or independent variables into the model as a random intercept. This allows to minimize or at least take into account effects or changes over time, thus minimizing the effects of repeated samples. An example would be a drug used to treat a disease, yet the drug may only become potent over time, and this effect differs between patients for unknown reasons. In this case, a mixed effect model is well able to take such effects into account. The same counts for spatial autocorrelation effects, which can be calculated out of the variance of such models. Spatial autocorrelation haunts all scientific models that operate in spatial systems, since almost all of them do not operate in linear and gradual spaces, but instead show jumps and deviances, and these translate and feed into spatial autocorrelation effects. We only glimpse the surface of processes that are far from being entirely solved, yet advanced statistical designs became better over time, and longitudinal studies or autocorrelation effects are testimony of this development.

4) Sample size

The biggest challenge in almost all designs is to find and deal with the proper and unbiased sample size. Statistical effects demand a certain size of a sample. The interaction between different predictor variables and random factors can quickly lead to an exponentiation of the sample, demanding tremendous efforts to arrive at a proper sample size. This is when parsimony needs to be grounded in deep experience, because too large samples may be unfeasible or even ethically contested, because sampling may for example be a medical procedure that is not entirely harmless. Overall, most sampling is typically costly, or at least not for free. Also, any given sample needs to represent the larger population. Most medical trials exclude pregnant people, because of the dangers associated with it. Ethnicity is a contested component that is also crucially to be considered within clinical trials. Harm may need to be considered in animal studies, and many would argue that these should not be conducted to begin with. Hence is sample size a phenomenon that needs to be examined on a case by case basis. Text book rules such as a number of thirty samples per included interaction originate from very specific contexts that cannot be superimposed on other contexts. Therefore, experience is needed, and any scientist utilizing advanced statistical designs needs to stand on the shoulders of giants.

Analysis

This text would not be complete without a note of caution. There are currently several evolutions happening in statistics that create ripples that affect advanced statistical designs within their scheme of analysis. Machine learning and AI are examples of analytical approaches that are purely predictive and often utilized in a more inductive fashion. Yet within the educative part of statistics that is typically used to analyze such deductively designed datasets, there are also several things at play. For a start, there is a difference between an analysis that builds on frequentist statistics vs. Bayesian statistics. This creates further ripples in terms of diverse modeling approaches (GLMM, GAMM etc.) that can be applied. There is also a question of the usage of p-values vs. information theoretical approaches (e.g. AIC, BIC), and equally other questions of measures of model performance or evaluation. Hence is not only the design of a study crucial for its outcome, but also the means and pathways of analysis. P-values increasingly vanish, yet are still the basis of the gross majority of most published statistics. There are some thoughts on model reduction already written elsewhere, yet it is important to acknowledge that all this is deeply normative and also disciplinary entrenched at the moment. It is clearly well beyond one text to solve this matter.

The future of advanced statistical designs

It was already mentioned that current statistical designs have to prove that they offer approaches that differ from modern machining learning algorithms, which are currently steeply on the rise both in terms of development as well as application. However, deductive approaches demand a deeply experienced albeit deeply constructed array of approaches. Just as painting usually consists of many colors, does our diversity of analytical approaches consist of an ever-growing colorful palette of approaches. Time will tell if advanced statistical designs will remain a staple in the deductive branches of modern science, and how they tackle problems of autocorrelation, multicollinearity or an uncountable array of biases in our samples.