Difference between revisions of "Back of the envelope statistics"

Revision as of 12:06, 8 July 2021

In short: Back of the envelope statistics revolve around rough, initial calculations that provide a first understanding of quantitative data. In this entry, some approaches are presented.

Back of the envelope statistics

Back of the envelope calculations give you a first impression about your idea and where it can go to.

Back of the envelope calculations are rough estimates that are made on the small piece of paper, hence the name. They are extremely helpful to get a quick estimate about the basic numbers for a given formula of principle, thus enabling us to get a quick calculation that allows us either to check for the plausibility of an assumption, or to derive a simple explanation for a complex issue. Back of the envelope statistics can be for instance helpful when you want to get a rough estimate about an idea that can be expressed in numbers. Prominent examples include the dominant character coding of the World Wide Web, as well as the development of the laser. Back of the envelope calculations are fantastic within sustainability science, because they can help us illustrate complex issues in a simple form, and they can serve as a guideline for quick planning. Therefore, they can be used within other more complex forms of methodological applications, such as scenario planning. By quickly calculating different scenarios, we can for instance apply a plausibility check and focus our approaches on-the-fly. I encourage you to learn back of the envelope calculations in your everyday life, as many of us already do. A great app for this is "Tydlig", which unfortunately only exists for iOS devices. It is a great example however of how to break numbers down into overall estimates, and make quick on-the-fly calculations.

Some recommendable examples

In the following, I provide you some simple examples of back of the envelope calculations, which may help you gain some understanding on why these could be valuable.

Simple calculations

Adding, subtracting, multiplying and dividing are part of the everyday language of mathematics, and while we all should have learned these in school, most of us are not versatile in applying them regularly. How do we divide the bill by three? What is 20 % of the bill to add as a tip? How many pieces of cake remain? How can I double this recipe? There is ample evidence that this can be intuitive to some, yet most of us struggle. However, I think it is extremely important to regularly practice, and there are apps such as "Brilliant" and games like Sudoku to improve our skills with numbers. Only if you learn the basics, you will be able to master the supreme mathematics.

Probability

We often hear numbers about probability and chances. While many of these are arbitrary, such as the chances of winning the Lottery, there are others probabilities that matter in our day-to-day life. For instance, the chances of catching a COVID-19 infection outside is 19 times lower than inside. One common misconception is that the chances are 0 - they are not, they are just lower, but considerably so! Another misconception is the question of how low your chances of basically anything are in general. Quite often we cannot stop computing a chance calculation, even though our chances are actually very low anyway. Will I win the Lottery? Probably not. Will I be diagnosed with a rare condition? Probably not. Still, we think more about such things than necessary, considering our actual chances. Then again, spending some time on computing the real probability of a certain event happening may actually help us finish the thought once and for all, and act accordingly. The COVID-19 crisis is a good example, where a combination of several modes of action, such as social distancing, wearing masks or washing your hands, can substantially lower your chances of catching the disease. Knowing about this makes a lot of sense. However, it will be next to impossible to lower your chances to 0, unless you would be willing to create harm to others or yourself in an extent that may, in turn, outweigh the low chances of catching the disease. Calculating probabilities can be a good exercise to be more clear about your chances.

However, sometimes it is not the chances or probabilities that you are interested in, but the actual numbers and what they mean, and these two things differ. A chance of 1:100000 seems rather smallish, yet if I tell you that this is the chance of a popular touristic attraction - a scenic cliff - that a tourist falls of the cliff while taking a selfie, then you would agree to create safety measures. Numbers count. Probabilities or chances are good to put things into perspective, and compare different risks or possibilities. However, thinking about what a certain probability actually means can really make a difference when evaluating said probability. Sometimes, it is worthwhile considering both sides of the coin. Then again, when you are afraid of flying in a plane because the plane might crash, knowing the chances of actually crashing will hardly help you to get over your fear because the image of what it means is too prevalent in your mind. On the other hand, many smokers are aware of the healths risks and seemingly do not care, and might be better off actually contextualizing these risks. I propose that we should more often calculate our chances - in other words, the probability of something - and the actual meaning of being that one in a million. This might give us a more accurate picture of our conditions and circumstances, and help us decide what to do, and how to act.

Trend statistics

Another prominent example of back of the envelope calculations is knowledge about predictions. Unfortunately, many predictions that we make based on rather small samples can be wrong. There are prominent examples of recent elections where much was at stake, and it was ambiguous what would come out in the end. The US election in 2016 and 2020 are relevant examples, which showcase how different samples in forecasts lead to changes in the outcome. For instance, in 2020, over the first day after the election, the counting was leaning towards Trump, yet with more and more counts coming in from the larger cities, many states tilted towards Biden. Here, the pre-counts were biased towards a more positive outcome for the Republicans. However, looking at the counties, and checking where most counts were still being counted, gave a clear picture early on towards a shift to Biden. Such trend statistics can hence be rather advanced, and knowledge about the context is crucial. More often than not, this is however not respected. It is almost impossible to give an absolute minimum sample size that is reliable for any kind of calculation, so we should be aware of a given sample's limits and contextualize our decisions accordingly.

The case statistics of the early COVID-19 infections from Wuhan in Januar of 2020 showed an early picture in terms of the spread of this disease. With this rate of increase, and the rate of mortality evolving with a few weeks lag, the information that patients were spreading the disease before showing symptoms sealed the trend. We learned from this, and from then on, all trend calculations could be probed based on the initial development in Wuhan. Calculations about countermeasures or vaccines later were mere modifications of the same trend function. The numbers were big enough here to make plausible claims. Calculating percentage growth and then extrapolating further up until a maximum system saturation is reached is thus a common tool to understand what might happen. Looking at temporal trend data and understanding the patterns is one of the most essential tools in these times of global change. Become versatile in reading bar plots, and learn to calculate trends both in absolute numbers as well as in percentage growth. If you practice this, many developments will not come as a surprise to you.

Group differences

While there are advanced statistical tests to compare groups and calculate differences between individual groups, comparing mean values between groups is the most relevant calculation. While there is for instance a high variance in lactose intolerance across the globe, people from certain regions should avoid milk probably more than people from other groups. Equally, some groups are more prone to specific diseases than others. This may not say much about your individual risk, but is still an initial prior that can translate into a different chance calculation (see above). Variance and mean values are not the same. We should be aware of this, and ask how things vary between individuals, and how much a mean value can really say about groups of these individuals. Consider the case of the enzyme breaking down alcohol, which is missing in some people of Asian heritage, leading to an overall lower mean value of Alcohol intolerance in this group. Still, this might not help you in a drinking game, since the variance regarding this enzyme is quite high. Another example is the soapy taste some people experience when eating corianthe. While indeed more Europeans than Asians experience a soapy taste, this difference is remarkably small, and our knowledge about this is based on a study that sampled many European participants, and few participants of Asian heritage. Hence, Back of the envelope statistics can not only help you to know certain patterns better, but can equally help you to recognise flaws in other peoples assumptions or calculations. It is indeed quite often the case that research highlight results that may be statistically significant, but when you check out the actual patterns, then these are quite small.

System dynamics

The last approach I want to highlight are complex system calculations where several calculations are broken down based on various assumptions. A prominent calculation is the carbon footprint of a person. This can be a quite advanced endeavour demanding many calculations of supply chains and circular dynamics, behavioral choices, individual circumstances and external influences. Yet, there are proxies that can allow you to make a rough estimation. The combination of your food choices, travel distance and frequency - including commute habits - as well as heating and electricity usage can lead to severe changes in the carbon footprint of a person, but we can often rely on general estimates to get a first useful insight. Approximating such crude measures can often be more transformative than advanced calculation, and we better leave those to the professionals. Building your life choices on simple heuristics can be helpful for many people, and Back of the envelope calculations can support such approaches.

The author of this entry is Henrik von Wehrden.

@@ Line 1: / Line 1: @@
+'''In short:''' Back of the envelope statistics revolve around rough, initial calculations that provide a first understanding of quantitative data. In this entry, some approaches are presented.
 ==Back of the envelope statistics==
+[[File:Bildschirmfoto 2020-04-08 um 11.37.25.png|thumb|left|Back of the envelope calculations give you a first impression about your idea and where it can go to.]]
-[[File:Bildschirmfoto 2020-04-08 um 11.37.25.png|thumb|left|Back of the envelope calculations give you a first impression about your idea and where it can go to.]]
+[https://www.investopedia.com/terms/b/back-of-the-envelope-calculation.asp Back of the envelope calculations] are rough estimates that are made on the small piece of paper, hence the name. They are extremely helpful to get a quick estimate about the basic numbers for a given formula of principle, thus enabling us to get a [https://www.stlouisfed.org/on-the-economy/2020/march/back-envelope-estimates-next-quarters-unemployment-rate quick calculation] that allows us either to check for the plausibility of an assumption, or to derive a simple explanation for a complex issue. '''Back of the envelope statistics can be for instance helpful when you want to get a rough estimate about an idea that can be expressed in numbers.''' Prominent examples include the dominant character coding of the World Wide Web, as well as the development of the laser. Back of the envelope calculations are fantastic within sustainability science, because they can help us illustrate complex issues in a simple form, and they can serve as a guideline for quick planning. Therefore, they can be used within other more complex forms of methodological applications, such as scenario planning. By quickly calculating different scenarios, we can for instance apply a plausibility check and focus our approaches on-the-fly. I encourage you to learn back of the envelope calculations in your [https://www.youtube.com/watch?v=bAU1MLRwh7c everyday life], as many of us already do. A great app for this is [http://tydligapp.com/ "Tydlig"], which unfortunately only exists for iOS devices. It is a great example however of how to break numbers down into overall estimates, and make quick on-the-fly calculations.
+<br/>
+<br>
-[https://www.investopedia.com/terms/b/back-of-the-envelope-calculation.asp Back of the envelope calculations] are rough estimates that are made on the small piece of paper, hence the name. These are extremely helpful to get a quick estimate about the basic numbers for a given formula of principle, thus enable us to get her [https://www.stlouisfed.org/on-the-economy/2020/march/back-envelope-estimates-next-quarters-unemployment-rate quick calculation] with either the goal to check for the plausibility of the assumption, or to derive a simple explanation of the more complex issue. Back of the envelope calculations can be for instance helpful when you want to get a rough estimate about an idea that can be expressed in numbers. Prominent examples for back of the envelope calculations include the dominant character coding of the World Wide Web and the development of the laser. Back of the envelope calculations are fantastic within sustainability science, I think, because they can help us to illustrate complex issues in a more simple form, and they can serve as her guideline for a quick planning. Therefore, they can be used within other more complex forms of methodological applications, such as scenario planning. By quickly calculating different scenarios we can for instance make her plausibility check and focus our approaches on-the-fly. I encourage you to learn back of the envelope calculations in your [https://www.youtube.com/watch?v=bAU1MLRwh7c everyday life], as many of us already do. I learned to love "Tydlig", which is one of the best apps I ever used, but unfortunately I only know her version for my Apple devices. It can however be quite helpful to break numbers down into overall estimates, as the video below illustrates.
+== Some recommendable examples ==
+In the following, I provide you some simple examples of back of the envelope calculations, which may help you gain some understanding on why these could be valuable.
-==Examples of back of the envelope calculations==
+=== Simple calculations ===
-Simple calculations
+Adding, subtracting, multiplying and dividing are part of the everyday language of mathematics, and while we all should have learned these in school, most of us are not versatile in applying them regularly. How do we divide the bill by three? What is 20 % of the bill to add as a tip? How many pieces of cake remain? How can I double this recipe? There is ample evidence that this can be intuitive to some, yet most of us struggle. However, I think it is extremely important to regularly practice, and there are apps such as [https://brilliant.org/ "Brilliant"] and games like Sudoku to improve our skills with numbers. Only if you learn the basics, you will be able to master the supreme mathematics.
-Adding, subtracting, multiplying and dividing are part of the everyday language of mathematics, and while we all should have learned these in school, most of us are not versatile in applying them regularly. How do we divided the bill by three? What is 20 % of the bill to add as a tip? How many pieces of cake remain? How can I double this recipe? There is ample evidence that this can be intuitive to some, yet most of us struggle. However, I think it is extremely important to regularly practice, and there are apps such as brilliant and games like Sudoko to improve our skills with numbers. Only if you learn the basics will you be able to master the supreme mathematics. In the following I provide you some simple examples of back of the envelope calculations, w which may help you to gain some understanding on why these could be valuable.
 ===Probability===
-We often hear numbers about probability and chances. While many of these are arbitrary, such as the chances to win the Lottery, there are others probabilities that matter in our day to day life. For instance are chances of catching a Covid infection outside 19 times lower than inside. One common misconception is that the chances are not 0, the caches are just lower. Another misconception is the question of how low your chances are in general. Quite often we cannot stop computing a chance calculation where our chances are actually low. Will I win the Lottery (probably not), or will I be diagnosed with a rare condition (probably not). Still, we think more about such things as are actually our chances. Being more clear about the probability of a certain event to happen may actually help us to compute the chances once and for all, and then act accordingly. For instance does the Corona crisis consists of a good example where a combination of several modes of action, such as social distancing, wearing masks or washing your hands can substantially lower your chances of catching the disease. However, it will be next to impossible to lower your chances to 0, at least not to the amount to creating harm to others or yourself that may outweigh the low chances of catching the disease. Calculating probabilities can be a good exercise to be more clear about your chances.
+We often hear numbers about probability and chances. While many of these are arbitrary, such as the chances of winning the Lottery, there are others probabilities that matter in our day-to-day life. For instance, the chances of catching a COVID-19 infection outside is 19 times lower than inside. One common misconception is that the chances are 0 - they are not, they are just lower, but considerably so! Another misconception is the question of how low your chances of basically anything are in general. Quite often we cannot stop computing a chance calculation, even though our chances are actually very low anyway. Will I win the Lottery? Probably not. Will I be diagnosed with a rare condition? Probably not. Still, we think more about such things than necessary, considering our actual chances. Then again, spending some time on computing the real probability of a certain event happening may actually help us finish the thought once and for all, and act accordingly. The COVID-19 crisis is a good example, where a combination of several modes of action, such as social distancing, wearing masks or washing your hands, can substantially lower your chances of catching the disease. Knowing about this makes a lot of sense. However, it will be next to impossible to lower your chances to 0, unless you would be willing to create harm to others or yourself in an extent that may, in turn, outweigh the low chances of catching the disease. '''Calculating probabilities can be a good exercise to be more clear about your chances.'''
-However, sometimes it is not the chances or probabilities that you are interested in, but the actual  numbers, and these two differ. A chance of 1:100000 seems rather smallish, yet if I tell you that this is the chance of a popular touristic attraction -a scenic cliff- that a  tourist falls of the cliff while taking a selfie, then you would agree to create safety measures. Numbers count. There is a difference between proportions or chances on the one end, and absolute numbers on the other end, and sometimes we need to choose one over the other. When you are for instance afraid of flying in a plane, because the plane might crash, knowing the chances of actually crashing will hardly help you to get over your fear. On the other hand are many smokers aware of the risks and seemingly do not care. I propose that we should more often calculate our chance for in other words the probability, as this might give us an actually more accurate picture of our conditions and circumstances. Based on this information, we can then decide what to do, and how to act.
+However, sometimes it is not the chances or probabilities that you are interested in, but the actual numbers and what they mean, and these two things differ. A chance of 1:100000 seems rather smallish, yet if I tell you that this is the chance of a popular touristic attraction - a scenic cliff - that a tourist falls of the cliff while taking a selfie, then you would agree to create safety measures. '''Numbers count.''' Probabilities or chances are good to put things into perspective, and compare different risks or possibilities. However, thinking about what a certain probability actually means can really make a difference when evaluating said probability. Sometimes, it is worthwhile considering both sides of the coin. Then again, when you are afraid of flying in a plane because the plane might crash, knowing the chances of actually crashing will hardly help you to get over your fear because the image of what it means is too prevalent in your mind. On the other hand, many smokers are aware of the healths risks and seemingly do not care, and might be better off actually contextualizing these risks. I propose that we should more often calculate our chances - in other words, the probability of something - and the actual meaning of being that one in a million. This might give us a more accurate picture of our conditions and circumstances, and help us decide what to do, and how to act.
 ===Trend statistics===
-Another prominent example of back of the envelope calculations is knowledge about predictions. Here, several misconceptions are at stake, and it may be beneficial to debunk these. First of all, we have to conclude that many of predictions that we make of smaller samples can be wrong. There are prominent examples of recent elections where much was at stake, and it was ambiguous what would come out in the end. The US election in 2016 and 2020 was one of the most relevant examples, which showcase how different samples lead to changes in the outcome. For instance was in 2020 over the first day after the election the counting leaning towards Trump, yet with more and more counts coming in from the larger cities, many states tilted towards Biden. Here, the pre-counts were biased towards a more positive outcome for the republicans. However, looking at the counties, and checking where most counts were still being counted gave a clear picture early on towards a shift to Biden. Such trend statistics can hence be rather advanced, and knowledge about the context goes a long way. More often than not, this is however not the case. The case statistics from Wuhan showed an early picture of the priors in terms of the spread of this disease, and already in late January were the priors of the disease rather clear. With this rate of increase, and the rate of mortality evolving with a few weeks lack, the information that patients were spreading the disease before showing symptoms sealed the trend. From then on, all trend calculations could be probed based on the initial development in Wuhan, and calculating in counter measure or vaccines later were mere modifications of the same trend function. Calculating percentage growth and then extrapolating further up until a maximum system saturation is thus a common tool to understand what might happen. Looking at temporal trend data and understanding the patterns is one of the most essential tools in these times of global change. Become versatile in reading bar plots, and learn to calculate trends both in absolute numbers as well as in percentage growth, then many developments will not come as a surprise to you.
+Another prominent example of back of the envelope calculations is knowledge about predictions. '''Unfortunately, many predictions that we make based on rather small samples can be wrong.''' There are prominent examples of recent elections where much was at stake, and it was ambiguous what would come out in the end. The US election in 2016 and 2020 are relevant examples, which showcase how different samples in forecasts lead to changes in the outcome. For instance, in 2020, over the first day after the election, the counting was leaning towards Trump, yet with more and more counts coming in from the larger cities, many states tilted towards Biden. Here, the pre-counts were biased towards a more positive outcome for the Republicans. However, looking at the counties, and checking where most counts were still being counted, gave a clear picture early on towards a shift to Biden. Such trend statistics can hence be rather advanced, and knowledge about the context is crucial. More often than not, this is however not respected. It is almost impossible to give an absolute minimum sample size that is reliable for any kind of calculation, so we should be aware of a given sample's limits and contextualize our decisions accordingly.
+The case statistics of the early COVID-19 infections from Wuhan in Januar of 2020 showed an early picture in terms of the spread of this disease. With this rate of increase, and the rate of mortality evolving with a few weeks lag, the information that patients were spreading the disease before showing symptoms sealed the trend. We learned from this, and from then on, all trend calculations could be probed based on the initial development in Wuhan. Calculations about countermeasures or vaccines later were mere modifications of the same trend function. The numbers were big enough here to make plausible claims. Calculating percentage growth and then extrapolating further up until a maximum system saturation is reached is thus a common tool to understand what might happen. Looking at temporal trend data and understanding the [[Glossary|patterns]] is one of the most essential tools in these times of global change. Become versatile in reading [[Barplots,_Histograms_and_Boxplots#Barplots|bar plots]], and learn to calculate trends both in absolute numbers as well as in percentage growth. If you practice this, many developments will not come as a surprise to you.
 ===Group differences===
-While there are statistical tests to compare groups and calculate differences between individual groups, it is clear more often than not, comparing mean values between groups is the most relevant calculation. While there is for instance a high variance in lactose intolerance across the globe, people from certain regions should avoid milk probably more than people form other groups. Equally are some groups more prone to specific despises than others. This may not say much about your individual risk, but is still an initial prior that can translate into a different chance calculation (see above). Variance and mean values are not the same. Consider the case of the enzyme braking down alcohol, which is missing in some people of Asian heritage. While the mean values in terms of Alcohol intolerance are clear, this may not help you in a drinking game, since the variance regarding this enzyme is quite high. Another example is the soapy taste some people experience when eating corianthe. While indeed more Europeans as compared to some Asians experience a soapy taste, this difference is remarkably small, and based on a study that samples many Europeans, and few participants of Asian heritage. Hence back of the envelope calculations can not only help you to know certain patterns better, but can equally help you to recognise flaws in other peoples assumptions or calculations. It is indeed quite often the case that research highlight results that may be statistically significant, but when you check out the actually patterns, then these are quite small.
+While there are advanced statistical tests to [[Simple Statistical Tests|compare groups]] and calculate differences between individual groups, comparing mean values between groups is the most relevant calculation. While there is for instance a high variance in lactose intolerance across the globe, people from certain regions should avoid milk probably more than people from other groups. Equally, some groups are more prone to specific diseases than others. This may not say much about your individual risk, but is still an initial prior that can translate into a different chance calculation (see above). '''Variance and mean values are not the same.''' We should be aware of this, and ask how things vary between individuals, and how much a mean value can really say about groups of these individuals. Consider the case of the enzyme breaking down alcohol, which is missing in some people of Asian heritage, leading to an overall lower mean value of Alcohol intolerance in this group. Still, this might not help you in a drinking game, since the variance regarding this enzyme is quite high. Another example is the soapy taste some people experience when eating corianthe. While indeed more Europeans than Asians experience a soapy taste, this difference is remarkably small, and our knowledge about this is based on a study that sampled many European participants, and few participants of Asian heritage. Hence, Back of the envelope statistics can not only help you to know certain patterns better, but can equally help you to recognise flaws in other peoples [[Glossary|assumptions]] or calculations. It is indeed quite often the case that research highlight results that may be statistically significant, but when you check out the actual patterns, then these are quite small.
 ===System dynamics===
-The last calculation I want to highlight are complex system calculations where several calculations are broken down based on various assumptions. A prominent calculation is the carbon footprint of a person. While this can be a quite advanced endeavour demanding many calculations of supply chains and circular dynamics, there are proxies that can allow you to make a rough estimation. The combination of your food choices, travel distance and frequency -including commute habits- as well as heating and electricity usage can lead to severe changes in the carbon footprint of a person. Approximating such crude measures can often be more transformative than advanced calculation we better leave to the professionals. Building your life choices on simple heuristics can be helpful for many people, and back of the envelope calculations can support such approaches.
+The last approach I want to highlight are complex system calculations where several calculations are broken down based on various assumptions. A prominent calculation is the carbon footprint of a person. This can be a quite advanced endeavour demanding many calculations of supply chains and circular dynamics, behavioral choices, individual circumstances and external influences. Yet, there are proxies that can allow you to make a rough estimation. The combination of your food choices, travel distance and frequency - including commute habits - as well as heating and electricity usage can lead to severe changes in the carbon footprint of a person, but we can often rely on general estimates to get a first useful insight. '''Approximating such crude measures can often be more transformative than advanced calculation, and we better leave those to the professionals.''' Building your life choices on simple heuristics can be helpful for many people, and Back of the envelope calculations can support such approaches.
 ----