4.4. Probability Distributions

Recall that discrete random variables take integer values. Whereas, a continuous random variable can be real numbers. Here we will define common discrete and continuous probability distributions with an eye towards computing probabilities.

4.4.1. Discrete Distributions

Here are some examples of discrete random variables.

  • The number of times that heads might appear in 100 coin tosses.
  • The number of coin tosses needed to get 1 head.
  • The number of customers entering a store in 1 hour.
  • The number of products with a defect coming out of a machine each day.

The probability mass, p(k) = P(X = x_k), defines the set of probabilities for the random variable taking each possible value.

The expected value of a discrete random variable represents the long-run average value (mean).

E[X] = \sum_{k = 1}^n x_k\,P(X = x_k)

The variance of a discrete random variable is

Var[X] =  \sum_{k = 1}^n x_k^2\,P(X = x_k) - E[X]^2.

Bernoulli random variable

A trial that takes one of two outcomes, such as {(success – failure), (head – tail), (true – false), etc. }, has a Bernoulli distribution.

\begin{array}{rl}
p(0) &= P\{X = 0\} = 1 - p, \; \; (0 \leq p \leq 1) \\
p(1) &= P\{X = 1\} = p \end{array}

E[X] = p

Since X has only two values – zero and one, E[X^2] = E[X] = p.

Var[X] = p - p^2 = p (1 - p)

Binomial random variable

A Bernoulli experiment where the variable X represents the number of successes in n independent trials has a Binomial distribution with parameters (n, p), denoted X \sim Bin(n, p).

We need to introduce a new operator to describe the probability distribution. The construct {n \choose k} is called n choose k and is the number of different combinations that can be chosen from a set of n objects. For example, there are three ways that three (n = 3) coin tosses can yield two heads (k = 2): {(H H T), (H T H), (T H H)}.

{n \choose k} = \frac{n!}{(n - k)!\, k!}

{3 \choose 2} = \frac{3!}{1!\,2!} = \frac{(3 \times 2 \times 1)}
{(1) \times (2 \times 1)} = 3

p(k) = P(X = k) = {n \choose k} p^k (1 - p)^{(n - k)},
\; \; k = 0, 1, 2, \ldots n

MATLAB has a function called nchoosek.

E[X] = \sum_{k = 0}^n k\,{n \choose k} p^k (1 - p)^{(n - k)} =
n\,p

Var[X] = n\,p\,(1 - p)

For our example of three fair (p = 0.5) coin tosses:

>> pbin = @(n, k) nchoosek(n, k)*(0.5^k)*(0.5^(n - k));

% Find probabilities for number of heads (0 to 3).
>> p = zeros(1, 4);
>> for k = 0:3
p(k+1) = pbin(3, k);
end
>> p
p =
    0.1250    0.3750    0.3750    0.1250

% Expected value
>> E_x = sum((0:3).*p)
E_x =
    1.5000

When p is close to 0.5 the probability mass is symmetric as shown above, but is skewed when p is closer to 0 or 1. The probability mass of binomial variables resembles a discrete version of a normal distribution. We see this in Statistical Significance where we use the central limit theory with Bernulli events that sum to give a binomial distribution and ultimately are shifted and scaled to a standard normal distribution.

For help identifying all of the possible permutations of n events, MATLAB has a perm function that will list them.

Uniform discrete random variable

Some experiments may have multiple discrete outcomes that are each equally likely. The most obvious example is the throw of a six sided die, which has six possible outcomes each with probability of 1/6.

E[X] = \frac{a + b}{2}

Var[X] = \frac{n^2 - 1}{12}

The variables a and b represent minimum and maximum values. The number n represents how many different outcomes are available.

Poisson random variable

The Poisson distribution is for random variables that count the number of events that might occur within a fixed unit, such as a quantity or a time span. A special property of poisson random variables is that their values do not have a fixed upper limit. Some examples might be the number of defects per 1,000 items produced, the number of customers that a bank teller helps per hour, or the number of telephone calls processed by a switching system per hour. It is denoted as X \sim Poi(\lambda). An example Poisson distribution probability mass plot is show in Fig. 4.3 for \lambda =
3. Note that 0! = 1, so k = 0 is defined in the PMF.

p(k) = P(X = k) = \frac{\lambda^k\,e^{-\lambda}}{k!},
\; k = 0, 1, 2, \ldots, \; \lambda > 0

E[X] = \lambda

Var[X] = \lambda

>> poi = @(l, k) l.^k .* exp(-l) ./ factorial(k);

>> k = 0:10;
>> lambda = 3;
>> p = poi(lambda, k);
>> stem(k, p)
>> title('Poisson Distribution, \lambda = 3')
>> xlabel('k')
>> ylabel('p(k)')
../_images/poisson.png

Fig. 4.3 Poisson distribution probability mass

4.4.2. Continuous Distributions

Since a random variable following a continuous distribution can take on any real number, the probability of the variable being a specific value is zero, P(X = a) = 0. Instead a probability is specified only for a range of values (P(a < X < b)). Therefore, continuous variables do not have probability mass functions, instead we use a probability density function (PDF), f(x). The mean and variance are found from integrals of the PDF.

E[X] = \int_{-\infty}^\infty x\,f(x) \: dx

E\left[X^n\right] = \int_{-\infty}^\infty x^n\,f(x) \: dx

Var[X] = E\left[(X - E[X])^2\right] = E\left[X^2\right] - [E(X)]^2

The area under a region of the PDF, a definite integral, defines a probability. The total area under a PDF must be one. A useful tool for computing probabilities is a cumulative distribution function (CDF), which tells us the probability F(a) = P(X < a). The CDF is a function giving the result of integrating the PDF. A CDF plot is always zero on the left side of the plot and one to the right.

\int_{-\infty}^\infty f(x) \: dx = 1

F(a) = \int_{-\infty}^a f(x) \: dx

The CDF simplifies probability computations.

  • P(a < X) = \int_{-\infty}^a f(x) \: dx = F(a)
  • P(a < X < b) = \int_a^b f(x) \: dx = F(b) - F(a)
  • P(X < a) = \int_a^{\infty}f(x) \: dx = 1 - F(a)
Uniform random variable

A random variable with a Uniform distribution is equally likely to take any value within an interval [a, b]. The PDF and CDF for X \sim U(2, 5) is show in Fig. 4.4.

../_images/unif.png

Fig. 4.4 Uniform distribution PDF and CDF

f(x) = \left\{ \begin{array}{ll}
\frac{1}{b - a}, &a \leq x \leq b \\
0, &\text{otherwise} \end{array} \right.

F(c) = \left\{ \begin{array}{ll}
0, &c < a \\
\frac{c - a}{b - a}, &a \leq c \leq b \\
1, &c > b \end{array} \right.

E[X] = \frac{a + b}{2}

Var[X] = \frac{(b - a)^2}{12}

function F = unifcdf(c, varargin)
% UNIFCDF - CDF of continuous uniform distribution U(a, b)
%
%  Inputs: c - variable in question -- may be a scalar or row vector
%          a - minimum value of distribution
%          b - maximum value of distribution
%        unifcdf(c) is the same as unifcdf(c, 0, 1)

    if length(varargin) == 0
        a = 0;
        b = 1;
    else
        a = varargin{1};
        b = varargin{2};
    end
    F = zeros(1, length(c));
    inRange = (c >= a) & (c <= b);
    F(inRange) = (c(inRange) - a)/(b - a);
    F(c > b) = 1;
end
Exponential random variable

Exponential random variables model non-negative data where values of zero or close to zero are common and the probablity decreases as the value gets larger. Small values are more likely than larger values. For example, the total value of the coins in a person’s pocket has an exponential distribution. In such an experiment, many people will not have any coins in their pockets, and only a few people will have pocket change of more than a dollar. The exponential distribution is used to describe random variables that are lengths of time, such as the length of time between events or the duration of events. An example exponential distribution PDF and CDF are shown in Fig. 4.5. We might measure the length of time between customers entering a store or how long it takes to help each customer.

The exponential distribution is often used together with the Poisson distribution to predict capacity requirements. For example, in the telecommunications industry, the expected number of telephone calls to be handled by a switching system per hour is modeled with a Poisson distribution. While the length of the calls is modeled with an exponential distribution. The statistical model for the number and length of calls is used to determine the capacity requirements of the equipment. A similar analysis could predict the number of employees needed at a business.

f(x) = \left\{ \begin{array}{ll}
\lambda\,e^{-\lambda\,x}, &x > 0 \\
                       0, &x \leq 0
                       \end{array} \right.

F(a) = \left\{ \begin{array}{ll}
1 - e^{-\lambda\,a}, &a > 0 \\
0, &a \leq 0 \end{array} \right.

E[X] = \frac{1}{\lambda}

Var[X] = \frac{1}{\lambda^2}

../_images/Exp.png

Fig. 4.5 Exponential distribution

Normal random variable

The normal distribution, also called the Gaussian distribution, models random variables that naturally occur in nature and society. Random variables with a normal distribution include measurements (length, width, height, volume, weight, etc.) of plants and animals, scores on standardized tests, income levels, air pollution levels, etc. Its probability density function has the familiar bell shaped curve. We denote the distribution as X \sim N(\mu, \sigma).

f(x) = \frac{1}{\sigma\,\sqrt{2\,\pi}}
e^{\frac{-(x - \mu)^2}{2\,\sigma^2}}

To simplify things, we will map the random variable to a standard normal distribution with zero mean and unit variance, Y \sim N(0, 1).

y = \frac{x - \mu}{\sigma}

f(y) = \frac{1}{\sqrt{2\,\pi}} e^{\frac{-y^2}{2}}

Computing the CDF requires numerical integration because there is not a closed form integral solution to the PDF. Fortunately, there is a built-in MATLAB function that we can use to compute the integral. See the documentation for functions erf and erfc. The Statistics and Machine Learning Toolbox includes a function to compute the CDF, but implementing our own function using erfc is not difficult.

We desire a function called normcdf that behaves as follows for distribution Y \sim N(0, 1). A plot of the needed area calculation is shown in Fig. 4.6. A plot of the CDF is shown in Fig. 4.7.

\text{normcdf(a)} = P(Y < a) = F(a) = \frac{1}{\sqrt{2\,\pi}}
\int_{-\infty}^a e^{\frac{-y^2}{2}} \: dy

The erfc(b) function gives us the following definite integral.

\text{erfc(b)} = \frac{2}{\sqrt{\pi}}
\int_b^\infty e^{-y^2} \: dy.

So, if we multiply by 1/2, change the sign of the definite integral boundary-value to reflect that we are integrating from the boundary to infinity rather than from negative infinity to the boundary, and divide the boundary-value by \sqrt{2} because the squared integration variable in the erfc calculation is not divided by 2 as we need, then the integration will compute an equivalent area to what we need.

Note that because of the symmetry of the PDF curve about the mean, P(Y < a) is the same as P(Y > -a).

\text{normcdf(a)} = F(a) = \frac{1}{2} \left(
\frac{2}{\sqrt{\pi}} \int_{-a/\sqrt{2}}^\infty e^{-y^2} \: dy \right)
   = \frac{1}{2} \text{erfc(}-a/\sqrt{2}\text{)}

../_images/P_X_lt_a.png

Fig. 4.6 The shaded area is P(Y < a).

function F = normcdf(a, varargin)
% NORMCDF - cumulative distribution function for normal distribution
%   Inputs: a  - variable in question -- may be a scalar or row vector
%           mu - mean of distribution
%           sigma - standard deviation
%      normcdf(a) is the same as normcdf(a, 0, 1) - standard normal
%
%   Output: Probability P(X < a)

    if length(varargin) ~= 0
        mu = varargin{1};
        sigma = varargin{2};
        a = (a - mu)./sigma;
    end
    F = erfc(-a/sqrt(2))/2;
end

To plot the CDF:

>> a = linspace(-3,3);
>> plot(a, normcdf(a))
../_images/Ncdf.png

Fig. 4.7 CDF for X ~ N(0, 1)

Probability Example:

You learn that the lot of calves to be auctioned at the local livestock sale have a mean weight of 500 pounds with a standard deviation of 150 pounds. What fraction of the calves likely weigh:

  1. less than 400 pounds?
  2. more than 700 pounds?
  3. between 450 and 550 pounds?
>> % less than 400 pounds
>> less400 = normcdf(400, 500, 150)
less400 =
    0.2525

>> % more than 700 pounds
>> more700 = 1 - normcdf(700, 500, 150)
more700 =
    0.0912

>> % between 450 and 550 pounds
>> middle100 = normcdf(550, 500, 150) - ...
normcdf(450, 500, 150)
middle100 =
    0.2611