# Poisson Distribution in Python

## 1. Introduction to Probability Distribution

A probability distribution summarizes the probabilities for the values of random variable. The general properties/features that define a distribution include the four mathematical moments:

1. Expected value E(X): This is the outcome with the highest probability or likelihood or an average or mean value for the variable X.

2. Variance Var(X): Variance denotes the spread of the values from the expected value. The standard deviation is the normalized value of variance obtained by taking a square root of the variance. The covariance summarizes the linear relationship for how the two variables change with respect to each other.

3. Skewness: Skewness is the measure of asymmetry of a random variable X about the mean E(X) of a probability distribution. It can be positive (tail on the right side), negative (tail on left side), zero (balanced tail on both sides) or undefined. Zero skewness does not always mean symmetric distribution as one tail can be long and thin and the other can be short and fat.

4. Kurtosis: Kurtosis is the measure of ‘ *peaked-ness* ’ of distribution. It can be differentiating tool between distributions that have same mean and variance. A Higher kurtosis means a peak and fatter/extended tails.

Each random variable has its own probability distribution that may have similar shape to other variables, however, the structure will differ based on whether the variable is discrete or continuous, since probabilitz distributions of continuous variables have an infinite number of values and probability functions of discrete variables assign a probability to each concrete data point.

## 2. Poisson Distribution

Poisson Distribution is one of the discrete probability distributions along with binomial, hypergeometric and geometric distribution. The following table differentiates what applies where.

Distribution Definition
Binomial It is used when there are two possible outcomes (success/failure) in a process that are independent of each other in n number of trials. Easiest example is a coin toss whereas a more practical use of binomial distribution is testing a drug, whether the drug cures the disease or not in n number of trials
Hypergeometric It calculates the number of k successes in n number of trials where the probability of success changes with each passing trial. This kind of distribution applies in Poker when after drawing a red card from the deck changes the probability of drawing another red card after it.
Poisson It provides the probability of an event happening a certain number of times (k) within a given interval of time or space. For example, figuring out the probability of disease occurrence m times in the next month given that it occurs n times in 1 year.
Geometric It determines the number of independent trials needed to get the first successful outcome. Geometric distribution may be used to conduct a cost-benefit analysis of a certain decision in a business.

Poisson distribution can only be used under three conditions: 1. Data is counts of events i.e., they are non-negative integers. 2. The events are random and occur independent of each other. 3. The mean number of events occurring a specific time frame is constant and known.

For example, Poisson distribution is used by mobile phone network companies to calculate their performance indicator like efficiency and customer satisfaction ratio. The number of network failures in a week in a locality can be measured given the history of network failures in a particular time duration. This predictability prepares the company with proactive solution and customers are warned in advance. For more real-life example of Poisson distribution in practice, visit this page.

## 3. Calculation and Probability Mass Function (PMF) Graph

### Probability Mass Function Graph

The Poisson distribution probability mass function (pmf) gives the probability of observing k events in a time period given the length of the period and the average events per time.

T = Time interval e = natural logarithmic base k = number of events The K! means K factorial. This means that we multiply all integers from K down to 1. Say K is 4 then K!= 4* 3* 2* 1= 24

Introducing lamda, λ, as a rate parameter (events/time)*T into the equation gives us

So, λ is basically the expected number of event occurrence in an interval, a function of number of events and the time interval. Changing λ means changing the probability of event occurrence in one interval.

For example, Ladislaus Bortkiewicz calculated the deaths of soldier by horse kicks in Prussian Army using Poisson Distribution in the late 1800s. He analyzed two decades of data of up to 10 army corps, equivalent to two centuries of observations for one army corps.

Figure 1: Poisson Distribution for deaths by horse kick by Ladislaus Bortkiewicz. Source: Scribbr

According to his observation, an average of 0.61 soldier died every year. However, the deaths were random, for example, in one year four soldiers died and for most year no deaths were caused by horses. Figure 1 shows the probability mass function graph. In Poisson Distribution terms,

• Death caused by horse kick is an ‘event’
• Mean per time interval is, represented by λ, is 0.61
• The number of deaths by horse kick in specific year is k.

Figure 2 is the probability mass function of the Poisson distribution and shows the probability (y-axis) of a number of events (x-axis) occurring in one interval with different rate parameters. You can see the most likely number of event occurrence for a graph is equivalent to its rate parameter (λ) as it is the expected number of events in one interval when it is an integer. When it is not an integer, the number of events with highest probability would be nearest to λ. The Lamda, λ, also represents the mean and variance of the distribution.

## 4. Features and Properties

Some properties of Poisson distribution are:

1. As seen in figure 2, if a quantity is Poisson Distributed with rate parameter, λ, it’s average value is λ.

2. Variance of the distribution is also λ, implying when λ increases the width of the distribution goes as square root lamda √λ.

3. A converse in Raikov’s theorem says that if the sum of two independent random variable is Poisson distributed, then so are each of those two independent random variables Poisson distributed.

Consider the example of radioactive decay for long-lived isotopes, in a radioactive sample containing a large number of nuclei, each of which has a tiny probability of decaying during some time interval, T. Let’s say the rate of decay is 0.31 decay/second and is monitored for 10 seconds. That gives us the λ1 0.31 * 10 = 3.1 which means the probability equals,

Consider another example where λ2 is 2.7, the probability is

Therefore, If we look at the total number k = k1 + k2 of a radioactive decay in a time T, the result is also a Poisson distribution with λ = λ1 + λ2 -> λ = 3.1 + 2.7 = 5.8 : P(k, λ1 + λ2 )

If we have two independent Poisson-distributed variable, their sum is Poisson distributed too.

4. The skewness us measured by 1/√λ

5. Excess kurtosis is measured by 1/λ. See the difference in Excess kurtosis and kurtosis here. In a nutshell, excess kurtosis compares the curtosis of the distribution with the curtosis of a normal distribution and can therefore tell you if an (extreme) event is more or less likely in this case than if the distribution followed a normal distribution.

6. Poisson Limit Theorem states that Poisson distribution may be used as an approximation to the binomial distribution, under certain conditions. When the value of n (number of trials) in a binomial distribution is large and the value of p (probability of success) is very small, the binomial distribution can be approximated by a Poisson distribution i.e., n -> ∞ and λ = np, rate parameter, λ is defined as number of trials, n, in binomial distribution multiplied by probability of success, p.

7. A Poisson distribution with high mean λ > 20 can be approximated as normal distribution. However, as normal distribution is a continuous probability distribution, a continuity correction is necessary. It would exceed the scope to discuss in detail here what this correction is. In short, it adds or substracts 0.5 to the value in question to increase the accuracy of the estimation when using a continuous probability approach for something that has discrete probabilities. For example, a factory with 45 accidents per year follows a Poisson distribution. Normal approximation would suggest that the probability of more than 50 accidents can be computed as following:

Mean = λ = μ =45

Standard deviation = ∂ = √λ = 6.71 P(X>50.5) -> after continuity correction =

using Z score table, see more here.

## 5. Poisson Distribution in Python

The following example generates the random data of number of duststorms in the city of Multan. The lamda is set at 3.4 storms per year and the data is monitored for 10,000 years.

```#import libraries

from scipy.stats import poisson ## to calculate the passion distribution
import numpy as np ## to prepare the data
import pandas as pd ## to prepare the data
import matplotlib.pyplot as plt ## to create plots```
```#Random Example: Modeling the frequency of number of duststorms in Multan, Pakistan

#creating data for 10000 years using scipy.stat.poisson library
#Rate lamda = 3.4 duststorm in Multan every year

d_rvs = pd.Series(poisson.rvs(3.4, size=10000, random_state=2)) #random_state so we can reproduce it #turning into panda series
d_rvs[:20] # first 20 entry, so the first 20 years, with the number of storms on the right```
`d_rvs.mean() # mean of 10000 values is 3.38, approximately to what we setfhors as average duststorm per year.`

Outcome: 3.3879

```#showing the frequency of years against the number of storms per year and sorting it by index.
data = d_rvs.value_counts().sort_index().to_dict() #storing in a dictionary
data
## You can see that the most years hat 2-4 storms which is also represented in the calculated average and our lambda.```

Outcome:

```0: 357,
1: 1122,
2: 1971,
3: 2144,
4: 1847,
5: 1253,
6: 731,
7: 368,
8: 126,
9: 54,
10: 24,
11: 2,
12: 1
```
```fig, ax = plt.subplots(figsize=(16, 6))
ax.bar(range(len(data)), list(data.values()), align='center')
plt.xticks(range(len(data)), list(data.keys()))
plt.show()```

The resulting pmf confirms that the closest integer value to lamba i.e., 3 has the most number of years out of 10,000 meaning most years will have 3 duststorms. You can also see that the data is slightly skewed to the right, since there is a larger variance to the right than to the left. Looking at the distribution, it looks fairly normally distributed. However, the low lambda does not allow to use the Poisson distribution as an approximation for a normal distribution. Most probably, the large dataset allows to see it as a normal distribution, since most distributions converge to a normal distribution with increasing sample size.

## 6. References

The author of this entry is XX. Edited by Milan Maushart