Difference between revisions of "Experiments"

From Sustainability Methods
Line 1: Line 1:
=== Experiments ===
 
  
Experiments describe the systematic and reproducible design to test specific hypothesis.
+
==History of laboratory experiments==
 
 
== The laboratory experiment ==
 
  
 
Starting with Francis Bacon there was the theoretical foundation to shift previously widely un-systematic experiments into a more structured form. With the rise of disciplines in the enlightenment experiments thrived, also thanks to an increasing amount of resources available in Europe due to the Victorian age and other effects of colonialism. Deviating from more observational studies in physics, astronomy, biology and other fields, experiments opened the door to the wide testing of hypothesis. All the while Mill and others build on Bacon to derive the necessary basic debates about so called facts, building the theoretical basis to evaluate the merit of experiments. Hence these systematic experimental approaches aided many fields such as botany, chemistry, zoology, physics and much more, but what was even more important, these fields created a body of knowledge that kickstarted many fields of research, and even solidified others. The value of systematic experiments, and consequently systematic knowledge created a direct link to practical application of that knowledge. The scientific method -called with the ignorant recognition of no other methods beside systematic experimental hypothesis testing as well as standardisation in engineering- hence became the motor of both the late enlightenment as well as the industrialisation, proving a crucial link between basically enlightenment and modernity.
 
Starting with Francis Bacon there was the theoretical foundation to shift previously widely un-systematic experiments into a more structured form. With the rise of disciplines in the enlightenment experiments thrived, also thanks to an increasing amount of resources available in Europe due to the Victorian age and other effects of colonialism. Deviating from more observational studies in physics, astronomy, biology and other fields, experiments opened the door to the wide testing of hypothesis. All the while Mill and others build on Bacon to derive the necessary basic debates about so called facts, building the theoretical basis to evaluate the merit of experiments. Hence these systematic experimental approaches aided many fields such as botany, chemistry, zoology, physics and much more, but what was even more important, these fields created a body of knowledge that kickstarted many fields of research, and even solidified others. The value of systematic experiments, and consequently systematic knowledge created a direct link to practical application of that knowledge. The scientific method -called with the ignorant recognition of no other methods beside systematic experimental hypothesis testing as well as standardisation in engineering- hence became the motor of both the late enlightenment as well as the industrialisation, proving a crucial link between basically enlightenment and modernity.
 +
Due to the demand of systematic knowledge some disciplines ripened, meaning that own departments were established, including the necessary laboratory spaces to conduct experiments. The main focus to this end was to conduct experiments that were as reproducible as possible, meaning ideally with a 100 % confidence. Laboratory conditions thus aimed at creating constant conditions and manipulating ideally only one or few parameters, which were then manipulated and therefore tested systematically. Necessary repetitions were conducted as well, but of less importance at that point. Much of the early experiments were hence experiments that were rather simple but produced knowledge that was more generalisable. There was also a general tendency of experiments either working or not, which is up until today a source of great confusion, as a trial and error and errors approach -despite being a valid approach- is often confused with a general mode of “experimentation”. In this sense, many people consider repairing a bike without any knowledge about bikes whatsoever as a mode of “experimentation”. We therefore highlight that experiments are systematic. The next big step was the provision of certainty and ways to calculate uncertainty, which came with the rise of probability statistics.
  
Due to the demand of systematic knowledge some disciplines ripened, meaning that own departments were established, including the necessary laboratory spaces to conduct experiments. The main focus to this end was to conduct experiments that were as reproducible as possible, meaning ideally with a 100 % confidence. Laboratory conditions thus aimed at creating constant conditions and manipulating ideally only one or few parameters, which were then manipulated and therefore tested systematically. Necessary repetitions were conducted as well, but of less importance at that point. Much of the early experiments were hence experiments that were rather simple but produced knowledge that was more generalisable. There was also a general tendency of experiments either working or not, which is up until today a source of great confusion, as a trial and error and errors approach -despite being a valid approach- is often confused with a general mode of “experimentation”. In this sense, many people consider repairing a bike without any knowledge about bikes whatsoever as a mode of “experimentation”. We therefore highlight that experiments are systematic. The next big step was the provision of certainty and ways to calculate uncertainty, which came with the rise of probability statistics.
 
 
== The field experiment ==
 
With a rise in knowledge, it became apparent that the controlled setting of a laboratory was not enough for the frontiers in knowledge that were being pushed. First in astronomy, but then also in agriculture and other fields the notion became apparent that our reproducible settings may sometimes be hard to achieve. Error of measurements in astronomy was a prevalent problem of optics and other apparatus in the 18th and 19th century, and Fisher equally recognised the mess -or variance- that nature forces onto a systematic experimenter. The demand for more food due to the rise in population, and the availability of potent seed varieties and fertiliser -both made thanks to scientific experimentation- raised the question how to conduct experiments under field conditions. Making experiments in the laboratory reached its outer borders, as plant growth experiments were hard to conduct in the small confined spaces of a laboratory, and it was questioned whether the results were actually applicable in the real world. Hence experiments literally shifted into fields, with a dramatic effect on their design, conduct and outcome. While laboratory conditions aimed to minimize variance -ideally conducting experiments with a high confidence- the new field experiments increased sample size to tame the variability -or messiness- of factors that could not be controlled, such as subtle changes in the soil or microclimate.
 
 
Field experiments became a revolution for many scientific fields. The systematic testing of hypotheses allowed first for agriculture and other fields of production to thrive, but then also did medicine, psychology, ecology and even economics use experimental approaches to test specific questions. This systematic generation of knowledge triggered a revolution in science, as knowledge became subsequently more specific and detailed. Take antibiotics, where a wide array of remedies was successively developed and tested. This triggered the cascading effects of antibiotic resistance, demanding new and updated versions to keep track with the bacteria that are likewise constantly evolving. This showcases that while the field experiment led to many positive developments, it also created ripples that are hard to anticipate. The problems created through fertiliser and GMO become more and more apparent but integrating the diversity of knowledge became a novel challenge within science. The rising number of systematic experiments that provided comparable data led to the creation of meta-analysis, which integrate knowledge from available studies to find the overarching effects. For instance may several studies show that Ibuprofen works against headache, yet other studies are inconclusive. Integrating all studies together proves however, that overall the drug seems to work against headaches, just not in all circumstances and with certain specific restrictions. The gold standard in medicine are the Cochrane reviews, which are the most established and thought out procedure how to integrate knowledge into meta-studies. Beside rigorous standards does this imply a specific set of statistical approaches that are able to take the diversity of cases and studies into account. It became a revolution of so-called mixed effect models to not only investigate what we want to know, but also to take into account what we do not want to know. A good example of this is sports and exercise investigated in health studies. A drug that should be researched may have certain positive impacts as a treatment, yet this impact may be less pronounced on people who are anyway healthy though daily exercise. Another example would be that we want to investigate whether a treatment works to heal patients, but we do not want to know whether it works better in one hospital compared to another hospital.
 
 
== Natural experiments ==
 
Another severe challenge that emerged out of the development of field experiments was an almost exact opposite trend. What do we do with singular cases? How do we deal with cases that are of pronounced importance, yet cannot be replicated. A famous example from ethnographic studies are the Easter Islands. Why did the people their channel much of their resources into building gigantic statues, thereby bringing their society to the brink of collapse? While this is a surely intriguing question, there are no replicates of the Easter Islands. This is surely a singular problem, and such settings are often referred to as Natural Experiments. From a certain perspective is our whole planet a natural Experiment, and it is also from a statistical perspective a problem that we do not have any replicates, besides other ramifications. Such singular cases are often increasingly relevant on a smaller scale as well. With a rise in qualitative methods both in diversity and abundance, and an urge for understanding even complex systems and cases, there is clearly a demand for the integration on knowledge from Natural Experiments. From a statistical viewpoint, such cases are difficult due to a lack of being reproducible, yet the knowledge can still be relevant, plausible and valid. To this end, I proclaim the concept of the niche in order to illustrate and conceptualize how single cases can still contribute to the production of knowledge. For example is the financial crisis from 2009, where many patterns where comparable to previous crisis, but other factors were different. Hence this crisis is comparable to many previous factors and patterns regarding some layers of information, but also novel and not transferable regarding other dynamics.
 
 
Real world experiments are the latest development in the diversification of the arena of experiments. These types of experiments are currently widely explored in the literature, and I do not recognize a coherent understanding of what real-world experiments are to date in the literature. These experiments can however be seen as a continuation of the trend of natural experiments, where a solution orientated agenda tries to generate one or several interventions, the effects of which are tested often within singular cases, but the evaluation criteria are clear before the study was conducted. Most studies to date have this defined with such a vigour, but the development of real-world experiments is only starting to emerge. Since this is only partly relevant for statistics, we will not elaborate further here, but highlight the available literature.
 
 
== Meta analysis ==
 
 
XXX
 
  
== Take home message ==
+
==Key concepts of laboratory experiments – sampling data in experimental designs==
 +
Statistics enabled replication as a central principle that was first implemented into laboratory experiments. Replicates are basically the repetition of the same experiment in order to derive whether an effect is constant or has a specific variance. This variance is an essential feature of many natural phenomena, such as plant growth, but also due to systematic errors such as measurement uncertainty. Hence the validity and reliability of an experiment could be better tamed. Within laboratory experiment, control of certain variables is essential, as this is the precondition to statistically test the few variables that are in the focus of the investigation. Control of variables means to this end, that such controlled variables are being held constant, thus the variables that are being tested are leading to the variance in the analysis. Consequently, such experiments are also known as controlled experiments. By increasing the sample size, it is possible to test the hypothesis according to a certain probability, and to generate a measure of reliability. The larger the sample is, the higher is the statistical power to be regarded. Within controlled experiments the so-called treatments are typically groups, where even continuous gradients are constructed into factors. An example would be the amount of fertilizer, which can be constructed into “low”, “middle” and ”high” amount of fertilizer. This allows a systematic testing based on a smaller number of replicates. The number of treatments or factor levels define the degrees of freedom of an experiment. The more levels are tested, the higher does the number of samples need to be, which can be calculated based on the experimental design. Therefore design scientists their experiments very clearly before conducting the study, and within many scientific fields are such experimental designs even submitted to a precheck and registration to highlight transparency and minimize potential flaws or manipulations. Such experimental designs can even become more complicated when interaction effects are considered. In such experiments, two different factors are manipulated and the interactions between the different levels are investigated. A standard example would be quantification of plant growth of a specific plant species under different watering levels and amounts of fertilizer. Taken together, it is vital for researchers conducting experiments to be versatile in the diverse dimensions of the design of experiments. Sample size, replicates, factor levels, degrees of freedom and statistical power are all to be considered when conducting an experiment. Becoming versatile in designing such studies takes practice.
  
Taken together, experiments can be powerful to test specific hypothesis. Experiments should be systematic but were evolved over time to investigate field conditions as well as small samples. The next years and decades will show how the world of big data and the growing information through the science society interaction will build and evolve the scientific experiment. Systematic experiments will probably remain a backbone of systematic knowledge production.
+
==Analysis of laboratory experiments==
 +
One key analysis tool of laboratory experiments -but also other experiments as we shall see later- is the so-called Analysis of Variance. Invented by Fisher, this statistical test is -mechanically speaking- comparing the means of more than two groups by extending the restriction of the t-test. Comparing different groups became thus a highly important procedure in the design of experiments, which beside laboratories is also highly relevant in greenhouse experiments in ecology, where conditions are kept stable through a controlled environment.
 +
The general principle of the Anova is rooted in hypothesis testing. An idealized null hypothesis is formulated against which the data is being tested. If the Anova gives a significant result, then the null hypothesis is rejected, hence it is statistically unlikely that the data confirms the null hypothesis. As one gets an overall p-value, it can be thus confirmed whether the different groups differ overall. Furthermore, the Anova allows for a measure beyond the p-value through the sum of squares calculations which derive how much is explained by the data, and how large in relation the residual or unexplained information is.
 +
Regarding the preconditions of the Anova, it is important to realize that the data should ideally be normally distributed on all levels, which however is often violated due to small sample sizes. Since a non-normal distribution may influence the outcome of the test, boxplots are a helpful visual aid, as these allow for a simple detection tool of non-normal distribution levels. Equally should ideally the variance be comparable across all levels, which is called homoscedastic. What is also important is the criteria of independence, meaning that samples of factor levels should not influence each other. For this reason are for instance in ecological experiments plants typically planted in individual pots. In addition does the classical Anova assume a balanced design, which means that all factor levels have an equal sample size.
 +
Single factor analysis that are also called one-way Anovas investigate one factor variable, and all other variables a kept constant. Depending on the number of factor levels these demand a so called randomization, which is necessary to compensate for instance for microclimatic differences under lab conditions.

Revision as of 05:38, 14 May 2020

History of laboratory experiments

Starting with Francis Bacon there was the theoretical foundation to shift previously widely un-systematic experiments into a more structured form. With the rise of disciplines in the enlightenment experiments thrived, also thanks to an increasing amount of resources available in Europe due to the Victorian age and other effects of colonialism. Deviating from more observational studies in physics, astronomy, biology and other fields, experiments opened the door to the wide testing of hypothesis. All the while Mill and others build on Bacon to derive the necessary basic debates about so called facts, building the theoretical basis to evaluate the merit of experiments. Hence these systematic experimental approaches aided many fields such as botany, chemistry, zoology, physics and much more, but what was even more important, these fields created a body of knowledge that kickstarted many fields of research, and even solidified others. The value of systematic experiments, and consequently systematic knowledge created a direct link to practical application of that knowledge. The scientific method -called with the ignorant recognition of no other methods beside systematic experimental hypothesis testing as well as standardisation in engineering- hence became the motor of both the late enlightenment as well as the industrialisation, proving a crucial link between basically enlightenment and modernity. Due to the demand of systematic knowledge some disciplines ripened, meaning that own departments were established, including the necessary laboratory spaces to conduct experiments. The main focus to this end was to conduct experiments that were as reproducible as possible, meaning ideally with a 100 % confidence. Laboratory conditions thus aimed at creating constant conditions and manipulating ideally only one or few parameters, which were then manipulated and therefore tested systematically. Necessary repetitions were conducted as well, but of less importance at that point. Much of the early experiments were hence experiments that were rather simple but produced knowledge that was more generalisable. There was also a general tendency of experiments either working or not, which is up until today a source of great confusion, as a trial and error and errors approach -despite being a valid approach- is often confused with a general mode of “experimentation”. In this sense, many people consider repairing a bike without any knowledge about bikes whatsoever as a mode of “experimentation”. We therefore highlight that experiments are systematic. The next big step was the provision of certainty and ways to calculate uncertainty, which came with the rise of probability statistics.


Key concepts of laboratory experiments – sampling data in experimental designs

Statistics enabled replication as a central principle that was first implemented into laboratory experiments. Replicates are basically the repetition of the same experiment in order to derive whether an effect is constant or has a specific variance. This variance is an essential feature of many natural phenomena, such as plant growth, but also due to systematic errors such as measurement uncertainty. Hence the validity and reliability of an experiment could be better tamed. Within laboratory experiment, control of certain variables is essential, as this is the precondition to statistically test the few variables that are in the focus of the investigation. Control of variables means to this end, that such controlled variables are being held constant, thus the variables that are being tested are leading to the variance in the analysis. Consequently, such experiments are also known as controlled experiments. By increasing the sample size, it is possible to test the hypothesis according to a certain probability, and to generate a measure of reliability. The larger the sample is, the higher is the statistical power to be regarded. Within controlled experiments the so-called treatments are typically groups, where even continuous gradients are constructed into factors. An example would be the amount of fertilizer, which can be constructed into “low”, “middle” and ”high” amount of fertilizer. This allows a systematic testing based on a smaller number of replicates. The number of treatments or factor levels define the degrees of freedom of an experiment. The more levels are tested, the higher does the number of samples need to be, which can be calculated based on the experimental design. Therefore design scientists their experiments very clearly before conducting the study, and within many scientific fields are such experimental designs even submitted to a precheck and registration to highlight transparency and minimize potential flaws or manipulations. Such experimental designs can even become more complicated when interaction effects are considered. In such experiments, two different factors are manipulated and the interactions between the different levels are investigated. A standard example would be quantification of plant growth of a specific plant species under different watering levels and amounts of fertilizer. Taken together, it is vital for researchers conducting experiments to be versatile in the diverse dimensions of the design of experiments. Sample size, replicates, factor levels, degrees of freedom and statistical power are all to be considered when conducting an experiment. Becoming versatile in designing such studies takes practice.

Analysis of laboratory experiments

One key analysis tool of laboratory experiments -but also other experiments as we shall see later- is the so-called Analysis of Variance. Invented by Fisher, this statistical test is -mechanically speaking- comparing the means of more than two groups by extending the restriction of the t-test. Comparing different groups became thus a highly important procedure in the design of experiments, which beside laboratories is also highly relevant in greenhouse experiments in ecology, where conditions are kept stable through a controlled environment. The general principle of the Anova is rooted in hypothesis testing. An idealized null hypothesis is formulated against which the data is being tested. If the Anova gives a significant result, then the null hypothesis is rejected, hence it is statistically unlikely that the data confirms the null hypothesis. As one gets an overall p-value, it can be thus confirmed whether the different groups differ overall. Furthermore, the Anova allows for a measure beyond the p-value through the sum of squares calculations which derive how much is explained by the data, and how large in relation the residual or unexplained information is. Regarding the preconditions of the Anova, it is important to realize that the data should ideally be normally distributed on all levels, which however is often violated due to small sample sizes. Since a non-normal distribution may influence the outcome of the test, boxplots are a helpful visual aid, as these allow for a simple detection tool of non-normal distribution levels. Equally should ideally the variance be comparable across all levels, which is called homoscedastic. What is also important is the criteria of independence, meaning that samples of factor levels should not influence each other. For this reason are for instance in ecological experiments plants typically planted in individual pots. In addition does the classical Anova assume a balanced design, which means that all factor levels have an equal sample size. Single factor analysis that are also called one-way Anovas investigate one factor variable, and all other variables a kept constant. Depending on the number of factor levels these demand a so called randomization, which is necessary to compensate for instance for microclimatic differences under lab conditions.