Experiments

From Sustainability Methods

History of laboratory experiments

Starting with Francis Bacon there was the theoretical foundation to shift previously widely un-systematic experiments into a more structured form. With the rise of disciplines in the enlightenment experiments thrived, also thanks to an increasing amount of resources available in Europe due to the Victorian age and other effects of colonialism. Deviating from more observational studies in physics, astronomy, biology and other fields, experiments opened the door to the wide testing of hypothesis. All the while Mill and others build on Bacon to derive the necessary basic debates about so called facts, building the theoretical basis to evaluate the merit of experiments. Hence these systematic experimental approaches aided many fields such as botany, chemistry, zoology, physics and much more, but what was even more important, these fields created a body of knowledge that kickstarted many fields of research, and even solidified others. The value of systematic experiments, and consequently systematic knowledge created a direct link to practical application of that knowledge. The scientific method -called with the ignorant recognition of no other methods beside systematic experimental hypothesis testing as well as standardisation in engineering- hence became the motor of both the late enlightenment as well as the industrialisation, proving a crucial link between basically enlightenment and modernity. Due to the demand of systematic knowledge some disciplines ripened, meaning that own departments were established, including the necessary laboratory spaces to conduct experiments. The main focus to this end was to conduct experiments that were as reproducible as possible, meaning ideally with a 100 % confidence. Laboratory conditions thus aimed at creating constant conditions and manipulating ideally only one or few parameters, which were then manipulated and therefore tested systematically. Necessary repetitions were conducted as well, but of less importance at that point. Much of the early experiments were hence experiments that were rather simple but produced knowledge that was more generalisable. There was also a general tendency of experiments either working or not, which is up until today a source of great confusion, as a trial and error and errors approach -despite being a valid approach- is often confused with a general mode of “experimentation”. In this sense, many people consider repairing a bike without any knowledge about bikes whatsoever as a mode of “experimentation”. We therefore highlight that experiments are systematic. The next big step was the provision of certainty and ways to calculate uncertainty, which came with the rise of probability statistics.


Key concepts of laboratory experiments – sampling data in experimental designs

Statistics enabled replication as a central principle that was first implemented into laboratory experiments. Replicates are basically the repetition of the same experiment in order to derive whether an effect is constant or has a specific variance. This variance is an essential feature of many natural phenomena, such as plant growth, but also due to systematic errors such as measurement uncertainty. Hence the validity and reliability of an experiment could be better tamed. Within laboratory experiment, control of certain variables is essential, as this is the precondition to statistically test the few variables that are in the focus of the investigation. Control of variables means to this end, that such controlled variables are being held constant, thus the variables that are being tested are leading to the variance in the analysis. Consequently, such experiments are also known as controlled experiments. By increasing the sample size, it is possible to test the hypothesis according to a certain probability, and to generate a measure of reliability. The larger the sample is, the higher is the statistical power to be regarded. Within controlled experiments the so-called treatments are typically groups, where even continuous gradients are constructed into factors. An example would be the amount of fertilizer, which can be constructed into “low”, “middle” and ”high” amount of fertilizer. This allows a systematic testing based on a smaller number of replicates. The number of treatments or factor levels define the degrees of freedom of an experiment. The more levels are tested, the higher does the number of samples need to be, which can be calculated based on the experimental design. Therefore design scientists their experiments very clearly before conducting the study, and within many scientific fields are such experimental designs even submitted to a precheck and registration to highlight transparency and minimize potential flaws or manipulations. Such experimental designs can even become more complicated when interaction effects are considered. In such experiments, two different factors are manipulated and the interactions between the different levels are investigated. A standard example would be quantification of plant growth of a specific plant species under different watering levels and amounts of fertilizer. Taken together, it is vital for researchers conducting experiments to be versatile in the diverse dimensions of the design of experiments. Sample size, replicates, factor levels, degrees of freedom and statistical power are all to be considered when conducting an experiment. Becoming versatile in designing such studies takes practice.

Analysis of Variances

One key analysis tool of laboratory experiments -but also other experiments as we shall see later- is the so-called Analysis of Variance. Invented by Fisher, this statistical test is -mechanically speaking- comparing the means of more than two groups by extending the restriction of the t-test. Comparing different groups became thus a highly important procedure in the design of experiments, which beside laboratories is also highly relevant in greenhouse experiments in ecology, where conditions are kept stable through a controlled environment. The general principle of the Anova is rooted in hypothesis testing. An idealized null hypothesis is formulated against which the data is being tested. If the Anova gives a significant result, then the null hypothesis is rejected, hence it is statistically unlikely that the data confirms the null hypothesis. As one gets an overall p-value, it can be thus confirmed whether the different groups differ overall. Furthermore, the Anova allows for a measure beyond the p-value through the sum of squares calculations which derive how much is explained by the data, and how large in relation the residual or unexplained information is.

Preconditions

Regarding the preconditions of the Anova, it is important to realize that the data should ideally be normally distributed on all levels, which however is often violated due to small sample sizes. Since a non-normal distribution may influence the outcome of the test, boxplots are a helpful visual aid, as these allow for a simple detection tool of non-normal distribution levels. Equally should ideally the variance be comparable across all levels, which is called homoscedastic. What is also important is the criteria of independence, meaning that samples of factor levels should not influence each other. For this reason are for instance in ecological experiments plants typically planted in individual pots. In addition does the classical Anova assume a balanced design, which means that all factor levels have an equal sample size. If some factor levels have less samples than others, this might pose interactions in terms of normals distribution and variance, but there is another effect at play. Larger sample sizes on one factor level may create a disbalance, where factor levels with larger samples pose a larger influence on the overall model result.

One way and two way Anovas

Single factor analysis that are also called one-way Anovas investigate one factor variable, and all other variables a kept constant. Depending on the number of factor levels these demand a so called randomization, which is necessary to compensate for instance for microclimatic differences under lab conditions. Deisgns with multiple factors or two way Anovas test for two or more factors, which then demands to test for interactions as well. This increases the necessary sample size on a multiplicatory scale, and the degrees of freedoms may dramatically increase depending on the number of factors levels and their interactions. An example of such an interaction effect might be an experiments where the effects of different watering levels and different amounts of fertiliser on plant growths are measured. While both increased water levels and higher amounts of fertiliser right increase plant growths slightly, the increase of of both factors jointly might lead to a dramatic increase of plant growths.

Interpretation of Anovas

Boxplots provide a first visual clue to whether certain factor levels might be significantly different within an Anova analysis. If one box within a boxplot is higher or lower than the median of another factor level, then this is a good rule of thumb whether there is a significant difference. When making such a graphically informed assumption, we have to be however really careful if the data is normally distributed, as skewed distributions might tinker with this rule of thumb. The overarching guideline for the Anova are thus the p-values, which give an overarching significance regarding the difference between the different factor levels. It can however be also relevant to compare the difference between specific groups, which is made by a postdoc test. A prominent example is the Tukey Test, where two factor levels are compared, and this is done iteratively for all factor level combinations. Since this poses a problem of multiple testing demanding a Bonferonni correction to adjust the p-value accordingly. Mechanically speaking, this is comparable to conducting several t-tests between two factor level combinations, and adjusting the p-values to consider the effects of multiple testing.


Examples

Tooth growths