Difference between revisions of "Experiments"

From Sustainability Methods
Line 33: Line 33:
  
 
===Interpretation of Anovas===
 
===Interpretation of Anovas===
Boxplots provide a first visual clue to whether certain factor levels might be significantly different within an Anova analysis. If one box within a boxplot is higher or lower than the median of another factor level, then this is a good rule of thumb whether there is a significant difference. When making such a graphically informed assumption, we have to be however really careful if the data is normally distributed, as skewed distributions might tinker with this rule of thumb. The overarching guideline for the Anova are thus the p-values, which give an overarching significance regarding the difference between the different factor levels. It can however be also relevant to compare the difference between specific groups, which is made by a postdoc test. A prominent example is the Tukey Test, where two factor levels are compared, and this is done iteratively for all factor level combinations. Since this poses a problem of multiple testing demanding a Bonferonni correction to adjust the p-value accordingly. Mechanically speaking, this is comparable to conducting several t-tests between two factor level combinations, and adjusting the p-values to consider the effects of multiple testing.
+
Boxplots provide a first visual clue to whether certain factor levels might be significantly different within an Anova analysis. If one box within a boxplot is higher or lower than the median of another factor level, then this is a good rule of thumb whether there is a significant difference. When making such a graphically informed assumption, we have to be however really careful if the data is normally distributed, as skewed distributions might tinker with this rule of thumb. The overarching guideline for the Anova are thus the p-values, which give significance regarding the difference between the different factor levels.  
  
 +
It can however be also relevant to compare the difference between specific groups, which is made by a '''postdoc test'''. A prominent example is the Tukey Test, where two factor levels are compared, and this is done iteratively for all factor level combinations. [Since this poses a problem of multiple testing demanding a Bonferonni correction to adjust the p-value accordingly]. Mechanically speaking, this is comparable to conducting several t-tests between two factor level combinations, and adjusting the p-values to consider the effects of multiple testing.
  
 
===Challenges of Anova experiments===
 
===Challenges of Anova experiments===
The Anova builds on a constructed world, where factor levels are like all variables constructs, which might be prone to errors or misconceptions. We should therefore realize that a non-significant result might also be related to the factor level construction. Yet a potential flaw can also range beyond implausible results, since Anovas do not necessarily create valid knowledge. If the underlying theory is imperfect, then we might confirm a hypothesis that is overall wrong. Hence the strong benefit of the Anova- the systematic testing of hypothesis- may equally be also its strongest weak point, as science develops, and previous hypothesis might have been imperfect if not wrong.  
+
The Anova builds on a constructed world, where factor levels are like all variables constructs, which might be prone to errors or misconceptions. We should therefore realize that a non-significant result might also be related to the factor level construction. Yet a potential flaw can also range beyond implausible results, since Anovas do not necessarily create valid knowledge. If the underlying theory is imperfect, then we might confirm a hypothesis that is overall wrong. Hence the strong benefit of the Anova - the systematic testing of hypothesis - may equally be also its strongest weak point, as science develops, and previous hypothesis might have been imperfect if not wrong.  
Furthermore, many researchers use the Anova today in an inductive sense. With more and more data becoming available, even from completely undersigned sampling sources, the Anova becomes the analysis of choice if the difference between different factor levels is investigated for a continuous variable. Due to the emergence of big data, these applications could be seen critical, since no real hypothesis are being tested. Instead, the statistician becomes a gold digger, searching the vastness of the available data for patterns, may these be causal or not. While there are numerous benefits, this is also a source of problems. Non-designed datasets will for instance not be able to test for the impact a drug might have on a certain disease. This is a problem, as systematic knowledge production is almost assumed within the Anova, but its application is these days far away from it. The inductive and the deductive world become intertwined, and this poses a risk for the validity of scientific results.
+
 
 +
Furthermore, many researchers use the Anova today in an inductive sense. With more and more data becoming available, even from completely undersigned sampling sources, the Anova becomes the analysis of choice if the difference between different factor levels is investigated for a continuous variable. Due to the emergence of big data, these applications could be seen critical, since no real hypothesis are being tested. Instead, the statistician becomes a gold digger, searching the vastness of the available data for patterns, [[Causality#Correlation_is_not_Causality|may these be causal or not]]. While there are numerous benefits, this is also a source of problems. Non-designed datasets will for instance not be able to test for the impact a drug might have on a certain disease. This is a problem, as systematic knowledge production is almost assumed within the Anova, but its application is these days far away from it. The inductive and the deductive world become intertwined, and this poses a risk for the validity of scientific results.
  
 
==Examples==
 
==Examples==

Revision as of 08:57, 15 May 2020

History of laboratory experiments

Experiments describe the systematic and reproducible design to test specific hypothesis.

Starting with Francis Bacon there was the theoretical foundation to shift previously widely un-systematic experiments into a more structured form. With the rise of disciplines in the enlightenment experiments thrived, also thanks to an increasing amount of resources available in Europe due to the Victorian age and other effects of colonialism. Deviating from more observational studies in physics, astronomy, biology and other fields, experiments opened the door to the wide testing of hypothesis. All the while Mill and others build on Bacon to derive the necessary basic debates about so called facts, building the theoretical basis to evaluate the merit of experiments. Hence these systematic experimental approaches aided many fields such as botany, chemistry, zoology, physics and much more, but what was even more important, these fields created a body of knowledge that kickstarted many fields of research, and even solidified others. The value of systematic experiments, and consequently systematic knowledge created a direct link to practical application of that knowledge. The scientific method -called with the ignorant recognition of no other methods beside systematic experimental hypothesis testing as well as standardisation in engineering- hence became the motor of both the late enlightenment as well as the industrialisation, proving a crucial link between basically enlightenment and modernity.

Due to the demand of systematic knowledge some disciplines ripened, meaning that own departments were established, including the necessary laboratory spaces to conduct experiments. The main focus to this end was to conduct experiments that were as reproducible as possible, meaning ideally with a 100 % confidence. Laboratory conditions thus aimed at creating constant conditions and manipulating ideally only one or few parameters, which were then and manipulated and therefore tested systematically. Necessary repetitions were conducted as well, but of less importance at that point. Much of the early experiments were hence experiments that were rather simple but produced knowledge that was more generalisable. There was also a general tendency of experiments either working or not, which is up until today a source of great confusion, as an trial and error and errors approach -despite being a valid approach- is often confused with a general mode of “experimentation”. In this sense, many people consider preparing a bike without any knowledge about bikes whatsoever as a mode of “experimentation”. We therefore highlight that experiments are systematic. The next big step was the provision of certainty and ways to calculate uncertainty, which came with the rise of probability statistics.

First in astronomy, but then also in agriculture and other fields the notion became apparent that our reproducible settings may sometimes be hard to achieve. Error of measurements in astronomy was a prevalent problem of optics and other apparatus in the 18th and 19th century, and Fisher equally recognised the mess -or variance- that nature forces onto a systematic experimenter. The laboratory experiment was hence an important step towards a systematic investigation of specific hypothesis, underpinned by newly established statistical approaches.

Key concepts of laboratory experiments – sampling data in experimental designs

Statistics enabled replication as a central principle that was first implemented into laboratory experiments. Replicates are basically the repetition of the same experiment in order to derive whether an effect is constant or has a specific variance. This variance is an essential feature of many natural phenomena, such as plant growth, but also caused by systematic errors such as measurement uncertainty. Hence the validity and reliability of an experiment could be better tamed.

Within laboratory experiment, control of certain variables is essential, as this is the precondition to statistically test the few variables that are in the focus of the investigation. Control of variables means to this end, that such controlled variables are being held constant, thus the variables that are being tested are leading to the variance in the analysis. Consequently, such experiments are also known as controlled experiments.

By increasing the sample size, it is possible to test the hypothesis according to a certain probability, and to generate a measure of reliability. The larger the sample is, the higher is the statistical power to be regarded. Within controlled experiments the so-called treatments are typically groups, [where even continuous gradients are constructed into factors]. An example would be the amount of fertilizer, which can be constructed into “low”, “middle” and ”high” amount of fertilizer. This allows a systematic testing based on a smaller number of replicates. The number of treatments or factor levels defines the degrees of freedom of an experiment. The more levels are tested, the higher does the number of samples need to be, which can be calculated based on the experimental design. Therefore, scientists design their experiments very clearly before conducting the study, and within many scientific fields are such experimental designs even submitted to a precheck and registration to highlight transparency and minimize potential flaws or manipulations.

Such experimental designs can even become more complicated when interaction effects are considered. In such experiments, two different factors are manipulated and the interactions between the different levels are investigated. A standard example would be quantification of plant growth of a specific plant species under different watering levels and amounts of fertilizer. Taken together, it is vital for researchers conducting experiments to be versatile in the diverse dimensions of the design of experiments. Sample size, replicates, factor levels, degrees of freedom and statistical power are all to be considered when conducting an experiment. Becoming versatile in designing such studies takes practice.

Analysis of Variances

One key analysis tool of laboratory experiments - but also other experiments as we shall see later - is the so-called Analysis of Variance. Invented by Fisher, this statistical test is - mechanically speaking - comparing the means of more than two groups by extending the restriction of the t-test. Comparing different groups became thus a highly important procedure in the design of experiments, which beside laboratories is also highly relevant in greenhouse experiments in ecology, where conditions are kept stable through a controlled environment. The general principle of the Anova is rooted in hypothesis testing. An idealized null hypothesis is formulated against which the data is being tested. If the Anova gives a significant result, then the null hypothesis is rejected, hence it is statistically unlikely that the data confirms the null hypothesis. As one gets an overall p-value, it can be thus confirmed whether the different groups differ overall. Furthermore, the Anova allows for a measure beyond the p-value through the sum of squares calculations which derive how much is explained by the data, and how large in relation the residual or unexplained information is.

Preconditions

Regarding the preconditions of the Anova, it is important to realize that the data should ideally be normally distributed on all levels, which however is often violated due to small sample sizes. Since a non-normal distribution may influence the outcome of the test, boxplots are a helpful visual aid, as these allow for a simple detection tool of non-normal distribution levels.

Equally should ideally the variance be comparable across all levels, which is called homoscedastic. What is also important is the criteria of independence, meaning that samples of factor levels should not influence each other. For this reason are for instance in ecological experiments plants typically planted in individual pots. In addition does the classical Anova assume a balanced design, which means that all factor levels have an equal sample size. If some factor levels have less samples than others, this might pose interactions in terms of normals distribution and variance, but there is another effect at play. Larger sample sizes on one factor level may create a disbalance, where factor levels with larger samples pose a larger influence on the overall model result.

One way and two way Anovas

Single factor analysis that are also called one-way Anovas investigate one factor variable, and all other variables a kept constant. Depending on the number of factor levels these demand a so called randomisation, which is necessary to compensate for instance for microclimatic differences under lab conditions. Designs with multiple factors or two way Anovas test for two or more factors, which then demands to test for interactions as well. This increases the necessary sample size on a multiplicatory scale, and the degrees of freedoms may dramatically increase depending on the number of factors levels and their interactions. An example of such an interaction effect might be an experiment where the effects of different watering levels and different amounts of fertiliser on plant growth are measured. While both increased water levels and higher amounts of fertiliser right increase plant growths slightly, the increase of of both factors jointly might lead to a dramatic increase of plant growth.

Interpretation of Anovas

Boxplots provide a first visual clue to whether certain factor levels might be significantly different within an Anova analysis. If one box within a boxplot is higher or lower than the median of another factor level, then this is a good rule of thumb whether there is a significant difference. When making such a graphically informed assumption, we have to be however really careful if the data is normally distributed, as skewed distributions might tinker with this rule of thumb. The overarching guideline for the Anova are thus the p-values, which give significance regarding the difference between the different factor levels.

It can however be also relevant to compare the difference between specific groups, which is made by a postdoc test. A prominent example is the Tukey Test, where two factor levels are compared, and this is done iteratively for all factor level combinations. [Since this poses a problem of multiple testing demanding a Bonferonni correction to adjust the p-value accordingly]. Mechanically speaking, this is comparable to conducting several t-tests between two factor level combinations, and adjusting the p-values to consider the effects of multiple testing.

Challenges of Anova experiments

The Anova builds on a constructed world, where factor levels are like all variables constructs, which might be prone to errors or misconceptions. We should therefore realize that a non-significant result might also be related to the factor level construction. Yet a potential flaw can also range beyond implausible results, since Anovas do not necessarily create valid knowledge. If the underlying theory is imperfect, then we might confirm a hypothesis that is overall wrong. Hence the strong benefit of the Anova - the systematic testing of hypothesis - may equally be also its strongest weak point, as science develops, and previous hypothesis might have been imperfect if not wrong.

Furthermore, many researchers use the Anova today in an inductive sense. With more and more data becoming available, even from completely undersigned sampling sources, the Anova becomes the analysis of choice if the difference between different factor levels is investigated for a continuous variable. Due to the emergence of big data, these applications could be seen critical, since no real hypothesis are being tested. Instead, the statistician becomes a gold digger, searching the vastness of the available data for patterns, may these be causal or not. While there are numerous benefits, this is also a source of problems. Non-designed datasets will for instance not be able to test for the impact a drug might have on a certain disease. This is a problem, as systematic knowledge production is almost assumed within the Anova, but its application is these days far away from it. The inductive and the deductive world become intertwined, and this poses a risk for the validity of scientific results.

Examples

Tooth growths

External Links

Articles

Francis Bacon: A little repetition

The Enlightenment: Also some kind of repetition

History of experiments in astronomy: A short but informative text

History of the Clinical Laboratory: A brief article

Videos

The Scientific Method: An insight into Bacons, Galileos and Descartes thoughts