Difference between revisions of "Back of the envelope statistics"

Revision as of 09:12, 30 August 2024

In short: Back of the envelope statistics revolve around rough, initial calculations that provide a first understanding of quantitative data. In this entry, some approaches are presented.

Back of the envelope statistics

Back of the envelope calculations give you a first impression about your idea and where it can go to.

Back of the envelope calculations are rough estimates that are made on the small piece of paper, hence the name. They are extremely helpful to get a quick estimate about the basic numbers for a given formula of principle, thus enabling us to get a quick calculation that allows us either to check for the plausibility of an assumption, or to derive a simple explanation for a complex issue. Back of the envelope statistics can be for instance helpful when you want to get a rough estimate about an idea that can be expressed in numbers. Prominent examples include the dominant character coding of the World Wide Web, as well as the development of the laser. Back of the envelope calculations are fantastic within sustainability science, because they can help us illustrate complex issues in a simple form, and they can serve as a guideline for quick planning. Therefore, they can be used within other more complex forms of methodological applications, such as scenario planning. By quickly calculating different scenarios, we can for instance apply a plausibility check and focus our approaches on-the-fly. I encourage you to learn back of the envelope calculations in your everyday life, as many of us already do. A great app for this is "Tydlig", which unfortunately only exists for iOS devices. It is a great example however of how to break numbers down into overall estimates, and make quick on-the-fly calculations.

Some recommendable examples

In the following, I provide you some simple examples of back of the envelope calculations, which may help you gain some understanding on why these could be valuable.

Simple calculations

Adding, subtracting, multiplying and dividing are part of the everyday language of mathematics, and while we all should have learned these in school, most of us are not versatile in applying them regularly. How do we divide the bill by three? What is 20 % of the bill to add as a tip? How many pieces of cake remain? How can I double this recipe? There is ample evidence that this can be intuitive to some, yet most of us struggle. However, I think it is extremely important to regularly practice, and there are apps such as "Brilliant" and games like Sudoku to improve our skills with numbers. Only if you learn the basics, you will be able to master the supreme mathematics.

Probability

We often hear numbers about probability and chances. While many of these are arbitrary, such as the chances of winning the Lottery, there are others probabilities that matter in our day-to-day life. For instance, the chances of catching a COVID-19 infection outside is 19 times lower than inside. One common misconception is that the chances are 0 - they are not, they are just lower, but considerably so! Another misconception is the question of how low your chances of basically anything are in general. Quite often we cannot stop computing a chance calculation, even though our chances are actually very low anyway. Will I win the Lottery? Probably not. Will I be diagnosed with a rare condition? Probably not. Still, we think more about such things than necessary, considering our actual chances. Then again, spending some time on computing the real probability of a certain event happening may actually help us finish the thought once and for all, and act accordingly. The COVID-19 crisis is a good example, where a combination of several modes of action, such as social distancing, wearing masks or washing your hands, can substantially lower your chances of catching the disease. Knowing about this makes a lot of sense. However, it will be next to impossible to lower your chances to 0, unless you would be willing to create harm to others or yourself in an extent that may, in turn, outweigh the low chances of catching the disease. Calculating probabilities can be a good exercise to be more clear about your chances.

However, sometimes it is not the chances or probabilities that you are interested in, but the actual numbers and what they mean, and these two things differ. A chance of 1:100000 seems rather smallish, yet if I tell you that this is the chance of a popular touristic attraction - a scenic cliff - that a tourist falls of the cliff while taking a selfie, then you would agree to create safety measures. Numbers count. Probabilities or chances are good to put things into perspective, and compare different risks or possibilities. However, thinking about what a certain probability actually means can really make a difference when evaluating said probability. Sometimes, it is worthwhile considering both sides of the coin. Then again, when you are afraid of flying in a plane because the plane might crash, knowing the chances of actually crashing will hardly help you to get over your fear because the image of what it means is too prevalent in your mind. On the other hand, many smokers are aware of the healths risks and seemingly do not care, and might be better off actually contextualizing these risks. I propose that we should more often calculate our chances - in other words, the probability of something - and the actual meaning of being that one in a million. This might give us a more accurate picture of our conditions and circumstances, and help us decide what to do, and how to act.

Trend statistics

Another prominent example of back of the envelope calculations is knowledge about making predictions with probability predictions. Unfortunately, many predictions that we make based on rather small samples can be wrong. There are prominent examples of recent elections where much was at stake, and it was ambiguous what would come out in the end. The US election in 2016 and 2020 are relevant examples, which showcase how different samples in forecasts lead to changes in the outcome. For instance, in 2020, over the first day after the election, the counting was leaning towards Trump, yet with more and more counts coming in from the larger cities, many states tilted towards Biden. Here, the pre-counts were biased towards a more positive outcome for the Republicans. However, looking at the counties, and checking where most counts were still being counted, gave a clear picture early on towards a shift to Biden. Such trend statistics can hence be rather advanced, and knowledge about the context is crucial. More often than not, this is however not respected. It is almost impossible to give an absolute minimum sample size that is reliable for any kind of calculation, so we should be aware of a given sample's limits and contextualize our decisions accordingly.

The case statistics of the early COVID-19 infections from Wuhan in Januar of 2020 showed an early picture in terms of the spread of this disease. With this rate of increase, and the rate of mortality evolving with a few weeks lag, the information that patients were spreading the disease before showing symptoms sealed the trend. We learned from this, and from then on, all trend calculations could be probed based on the initial development in Wuhan. Calculations about countermeasures or vaccines later were mere modifications of the same trend function. The numbers were big enough here to make plausible claims. Calculating percentage growth and then extrapolating further up until a maximum system saturation is reached is thus a common tool to understand what might happen. Looking at temporal trend data and understanding the patterns is one of the most essential tools in these times of global change. Become versatile in reading bar plots, and learn to calculate trends both in absolute numbers as well as in percentage growth. If you practice this, many developments will not come as a surprise to you.

Group differences

While there are advanced statistical tests to compare groups and calculate differences between individual groups, comparing mean values between groups is the most relevant calculation. While there is for instance a high variance in lactose intolerance across the globe, people from certain regions should avoid milk probably more than people from other groups. Equally, some groups are more prone to specific diseases than others. This may not say much about your individual risk, but is still an initial prior that can translate into a different chance calculation (see above). Variance and mean values are not the same. We should be aware of this, and ask how things vary between individuals, and how much a mean value can really say about groups of these individuals. Consider the case of the enzyme breaking down alcohol, which is missing in some people of Asian heritage, leading to an overall lower mean value of Alcohol intolerance in this group. Still, this might not help you in a drinking game, since the variance regarding this enzyme is quite high. Another example is the soapy taste some people experience when eating corianthe. While indeed more Europeans than Asians experience a soapy taste, this difference is remarkably small, and our knowledge about this is based on a study that sampled many European participants, and few participants of Asian heritage. Hence, Back of the envelope statistics can not only help you to know certain patterns better, but can equally help you to recognise flaws in other peoples assumptions or calculations. It is indeed quite often the case that research highlight results that may be statistically significant, but when you check out the actual patterns, then these are quite small.

System dynamics

The last approach I want to highlight are complex system calculations where several calculations are broken down based on various assumptions. A prominent calculation is the carbon footprint of a person. This can be a quite advanced endeavour demanding many calculations of supply chains and circular dynamics, behavioral choices, individual circumstances and external influences. Yet, there are proxies that can allow you to make a rough estimation. The combination of your food choices, travel distance and frequency - including commute habits - as well as heating and electricity usage can lead to severe changes in the carbon footprint of a person, but we can often rely on general estimates to get a first useful insight. Approximating such crude measures can often be more transformative than advanced calculation, and we better leave those to the professionals. Building your life choices on simple heuristics can be helpful for many people, and Back of the envelope calculations can support such approaches.

Further information

Key publications

Maths on the back of an envelope: Clever ways to (roughly) calculate anything: by Rob Eastaway, ISBN 978-0-00-832458-2, Harper Collins (2019)

Videos

Back-of-envelope office space conundrum: A real life example

Probability: An explanation

Finding probability: An exercise

You Are the Center of The Universe (Literally): Video by Kurzgesagt

Articles

Back of the Envelope Calculation: An explanation

How to get better at back of the envelope calculations: Tips on Back of the envelope calculations

“Back-of-the-Envelope Calculations”: Tips and exercises from an astronomer

Probability: The basics

An example: Key numbers in cell and molecular biology that enable back-of-the-envelope calculations

Podcasts

The more or less podcast: By BBC

The author of this entry is Henrik von Wehrden.

@@ Line 20: / Line 20: @@
 ===Trend statistics===
-Another prominent example of back of the envelope calculations is knowledge about [https://www.youtube.com/watch?v=8bK-xfh8-rY making predictions with probability predictions]. '''Unfortunately, many predictions that we make based on rather small samples can be wrong.''' There are prominent examples of recent elections where much was at stake, and it was ambiguous what would come out in the end. The US election in 2016 and 2020 are relevant examples, which showcase how different samples in forecasts lead to changes in the outcome. For instance, in 2020, over the first day after the election, the counting was leaning towards Trump, yet with more and more counts coming in from the larger cities, many states tilted towards Biden. Here, the pre-counts were biased towards a more positive outcome for the Republicans. However, looking at the counties, and checking where most counts were still being counted, gave a clear picture early on towards a shift to Biden. Such trend statistics can hence be rather advanced, and knowledge about the context is crucial. More often than not, this is however not respected. It is almost impossible to give an absolute minimum sample size that is reliable for any kind of calculation, so we should be aware of a given sample's limits and contextualize our decisions accordingly.
+Another prominent example of back of the envelope calculations is knowledge about [https://www.youtube.com/watch?v=8bK-xfh8-rY making predictions with probability predictions]. '''Unfortunately, many predictions that we make based on rather [https://sustainabilitymethods.org/index.php/Sampling_for_Interviews small samples] can be wrong.''' There are prominent examples of recent elections where much was at stake, and it was ambiguous what would come out in the end. The US election in 2016 and 2020 are relevant examples, which showcase how different samples in forecasts lead to changes in the outcome. For instance, in 2020, over the first day after the election, the counting was leaning towards Trump, yet with more and more counts coming in from the larger cities, many states tilted towards Biden. Here, the pre-counts were biased towards a more positive outcome for the Republicans. However, looking at the counties, and checking where most counts were still being counted, gave a clear picture early on towards a shift to Biden. Such trend statistics can hence be rather advanced, and knowledge about the context is crucial. More often than not, this is however not respected. It is almost impossible to give an absolute minimum sample size that is reliable for any kind of calculation, so we should be aware of a given sample's limits and contextualize our decisions accordingly.
 The case statistics of the early COVID-19 infections from Wuhan in Januar of 2020 showed an early picture in terms of the spread of this disease. With this rate of increase, and the rate of mortality evolving with a few weeks lag, the information that patients were spreading the disease before showing symptoms sealed the trend. We learned from this, and from then on, all trend calculations could be probed based on the initial development in Wuhan. Calculations about countermeasures or vaccines later were mere modifications of the same trend function. The numbers were big enough here to make plausible claims. Calculating percentage growth and then extrapolating further up until a maximum system saturation is reached is thus a common tool to understand what might happen. Looking at temporal trend data and understanding the [[Glossary|patterns]] is one of the most essential tools in these times of global change. Become versatile in reading [[Barplots,_Histograms_and_Boxplots#Barplots|bar plots]], and learn to calculate trends both in absolute numbers as well as in percentage growth. If you practice this, many developments will not come as a surprise to you.