12_RA

hrafnulf13
Dec 2, 2020
4 min read

A Poisson Process is a model for a series of discrete event where the average time between events is known, but the exact timing of events is random [1]. The arrival of an event is independent of the event before (waiting time between events is memoryless). For example, suppose we own a website which our content delivery network (CDN) tells us goes down on average once per 60 days, but one failure doesn’t affect the probability of the next. All we know is the average time between failures. This is a Poisson process that looks like:

The important point is we know the average time between events but they are randomly spaced (stochastic). We might have back-to-back failures, but we could also go years between failures due to the randomness of the process.

A Poisson Process meets the following criteria (in reality many phenomena modeled as Poisson processes don’t meet these exactly):

Events are independent of each other. The occurrence of one event does not affect the probability another event will occur.
The average rate (events per time period) is constant.
Two events cannot occur at the same time.

The last point - events are not simultaneous - means we can think of each sub-interval of a Poisson process as a Bernoulli Trial, that is, either a success or a failure. With our website, the entire interval may be 600 days, but each sub-interval - one day - our website either goes down or it doesn’t.

Common examples of Poisson processes are customers calling a help center, visitors to a website, radioactive decay in atoms, photons arriving at a space telescope, and movements in a stock price. Poisson processes are generally associated with time, but they do not have to be. In the stock case, we might know the average movements per day (events per time), but we could also have a Poisson process for the number of trees in an acre (events per area).

(One instance frequently given for a Poisson Process is bus arrivals (or trains or now Ubers). However, this is not a true Poisson process because the arrivals are not independent of one another. Even for bus systems that do not run on time, whether or not one bus is late affects the arrival time of the next bus. Jake VanderPlas has a great article on applying a Poisson process to bus arrival times which works better with made-up data than real-world data.)

Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event [2]. In other words, it helps finding the probability of a number of events in a time period or finding the probability of waiting some time until the next event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

For instance, an individual keeping track of the amount of mail they receive each day may notice that they receive an average number of 4 letters per day. If receiving any particular piece of mail does not affect the arrival times of future pieces of mail, i.e., if pieces of mail from a wide range of sources arrive independently of one another, then a reasonable assumption is that the number of pieces of mail received in a day obeys a Poisson distribution [2]. Other examples that may follow a Poisson distribution include the number of phone calls received by a call center per hour and the number of decay events per second from a radioactive source.

The Poisson Distribution probability mass function gives the probability of observing k events in a time period given the length of the period and the average events per time:

This is a little convoluted, and events/time * time period is usually simplified into a single parameter, λ, lambda, the rate parameter. With this substitution, the Poisson Distribution probability function now has one parameter:

Lambda can be thought of as the expected number of events in the interval. (We’ll switch to calling this an interval because remember, we don’t have to use a time period, we could use area or volume based on our Poisson process). I like to write out lambda to remind myself the rate parameter is a function of both the average events per time and the length of the time period but you’ll most commonly see it as directly above.

As we change the rate parameter, λ, we change the probability of seeing different numbers of events in one interval. The below graph is the probability mass function of the Poisson distribution showing the probability of a number of events occurring in an interval with different rate parameters.

Notes on Poisson Distribution and Binomial Distribution

A Binomial Distribution is used to model the probability of the number of successes we can expect from n trials with a probability p. The Poisson Distribution is a special case of the Binomial Distribution as n goes to infinity while the expected number of successes remains fixed. The Poisson is used as an approximation of the Binomial if n is large and p is small.

As with many ideas in statistics, “large” and “small” are up to interpretation. A rule of thumb is the Poisson distribution is a decent approximation of the Binomial if n > 20 and np < 10. Therefore, a coin flip, even for 100 trials, should be modeled as a Binomial because np = 50. A call center which gets 1 call every 30 minutes over 120 minutes could be modeled as a Poisson distribution as np = 4. One important distinction is a Binomial occurs for a fixed set of trials (the domain is discrete) while a Poisson occurs over a theoretically infinite number of trials (continuous domain). This is only an approximation; remember, all models are wrong, but some are useful!

Statistics 2020-2021

MSc Cybersecurity, Sapienza University

12_RA

References

Recent Posts

Comments