4.4. Probability Distributions

Recall that discrete random variables take integer values. Whereas a continuous random variable can be any real number. Here, we will define common discrete and continuous probability distributions with an eye toward computing probabilities.

4.4.1. Discrete Distributions

Here are some examples of discrete random variables.

  • The number of times that heads might appear in 100 coin tosses.

  • The number of coin tosses needed to get one head.

  • The number of customers entering a store in one hour.

  • The number of products with a defect coming out of a machine each day.

The probability mass, \(p(k) = P(X = x_k)\), defines the set of probabilities for the random variable taking each possible value. The expected value of a discrete random variable represents the long-run average value (mean).

\[E[X] = \sum_{k = 1}^n x_k\,P(X = x_k)\]

The variance of a discrete random variable is

\[Var[X] = \sum_{k = 1}^n x_k^2\,P(X = x_k) - E[X]^2.\]

Bernoulli random variable

A trial that takes one of two outcomes, such as {(success—failure), (head—tail), (true—false), etc. }, has a Bernoulli distribution.

\[\begin{split}\begin{array}{rl} p(0) &= P\{X = 0\} = 1 - p, \; \; (0 \leq p \leq 1) \\ p(1) &= P\{X = 1\} = p \end{array}\end{split}\]
\[E[X] = p\]

Since \(X\) has only two values—zero and one, \(E[X^2] = E[X] = p\).

\[Var[X] = p - p^2 = p (1 - p)\]

Binomial random variable

A Bernoulli experiment, where the variable \(X\) represents the number of successes in \(n\) independent trials, has a Binomial distribution with parameters (\(n, p\)), denoted \(X \sim Bin(n, p)\).

We need to introduce a new operator to describe the probability distribution. The construct \(\binom{n}{k}\) is called \(n\)–choose–\(k\) and is the number of combinations of \(k\) objects that can be chosen from a set of \(n\) objects. For example, there are three ways that three (\(n = 3\)) coin tosses can yield two heads (\(k = 2\)): {(H H T), (H T H), (T H H)}.

\[\binom{n}{k} = \frac{n!}{(n - k)!\, k!}\]
\[\binom{3}{2} = \frac{3!}{1!\,2!} = \frac{(3 \times 2 \times 1)} {(1) \times (2 \times 1)} = 3\]

MATLAB has a function called nchoosek.

\[p(k) = P(X = k) = \binom{n}{k} p^k (1 - p)^{(n - k)}, \; \; k = 0, 1, 2, \ldots n\]
\[E[X] = \sum_{k = 0}^n k\,\binom{n}{k}\, p^k (1 - p)^{(n - k)} = n\,p\]
\[Var[X] = n\,p\,(1 - p)\]

For our example of three fair (\(p = 0.5\)) coin tosses:

>> pbin = @(n, k) nchoosek(n, k)*(0.5^k)*(0.5^(n - k));

>> % Find probabilities for number of heads (0 to 3).
>> p = zeros(1, 4);
>> for k = 0:3
p(k+1) = pbin(3, k);
end
>> p
p =
    0.1250    0.3750    0.3750    0.1250

>> % Expected value
>> E_x = sum((0:3).*p)
E_x =
    1.5000

When \(p\) is close to 0.5, the probability mass is symmetric, as shown above, but is skewed when \(p\) is closer to 0 or 1. The probability mass of binomial variables resembles a discrete version of a normal distribution. We see this in Statistical Significance where we use the central limit theory with Bernoulli events that sum to give a binomial distribution and ultimately are shifted and scaled to a standard normal distribution.

For help identifying all of the possible permutations of \(n\) events, MATLAB has a perm function that will list them.

Uniform discrete random variable

Some experiments may have multiple discrete outcomes that are each equally likely. An obvious example is the throw of a six-sided die, which has six possible outcomes, each with a probability of \(1/6\).

\[E[X] = \frac{a + b}{2}\]
\[Var[X] = \frac{n^2 - 1}{12}\]

The variables \(a\) and \(b\) represent minimum and maximum values. The number \(n\) represents how many different outcomes are available.

Poisson random variable

The Poisson distribution is for random variables that count the number of events that might occur within a fixed unit, such as a quantity, a spatial region, or a time period. A special property of Poisson random variables is that their values do not have a fixed upper limit. Some examples might be the number of defects per 1,000 items produced, the number of customers that a bank teller helps per hour, or the number of telephone calls processed by a switching system per hour. It is denoted as \(X \sim Poi(\lambda)\). An example Poisson distribution probability mass plot is show in figure Fig. 4.3 for \(\lambda = 3\). Note that \(0! = 1\), so \(k = 0\) is defined in the PMF.

\[p(k) = P(X = k) = \frac{\lambda^k\,e^{-\lambda}}{k!}, \; k = 0, 1, 2, \ldots, \; \lambda > 0\]
\[E[X] = \lambda\]
\[Var[X] = \lambda\]
>> poi = @(l, k) l.^k .* exp(-l) ./ factorial(k);

>> k = 0:10;
>> lambda = 3;
>> p = poi(lambda, k);
>> stem(k, p)
>> title('Poisson Distribution, /lambda = 3')
>> xlabel('k')
>> ylabel('p(k)')
Poisson distribution probability mass

Fig. 4.3 Poisson distribution probability mass

4.4.2. Continuous Distributions

Since a random variable following a continuous distribution can take on any real number, the probability of the variable being a specific value is zero, \(P(X = a) = 0\). Instead, a probability is specified only for a range of values (\(P(a < X < b)\)). Therefore, continuous variables do not have probability mass functions; instead, we use a probability density function (PDF), \(f(x)\). The mean and variance are found from integrals of the PDF.

\[E[X] = \int_{-\infty}^\infty x\,f(x) \: dx\]
\[E\left[X^n\right] = \int_{-\infty}^\infty x^n\,f(x) \: dx\]
\[Var[X] = E\left[(X - E[X])^2\right] = E\left[X^2\right] - [E(X)]^2\]

The area under a region of the PDF, a definite integral, defines a probability. The total area under a PDF must be one. A valuable tool for computing probabilities is a cumulative distribution function (CDF), which tells us the probability \(F(a) = P(X < a)\). The CDF is a function that returns the PDF integrals. A CDF plot is always zero on the left side of the plot and one to the right.

\[\int_{-\infty}^\infty f(x) \: dx = 1\]
\[F(a) = \int_{-\infty}^a f(x) \: dx\]

The CDF simplifies probability computations.

  • \(P(X < a) = \int_{-\infty}^a f(x) \: dx = F(a)\)

  • \(P(a < X < b) = \int_a^b f(x) \: dx = F(b) - F(a)\)

  • \(P(X > a) = \int_a^{\infty}f(x) \: dx = 1 - F(a)\)

Note that for any continuous random variable \(X\), \(P(X = a) = 0\), therefore \(P(X \leq a) = P(X < a)\).

Uniform random variable

A random variable with a Uniform distribution is equally likely to take any value within an interval \([a, b]\). The PDF and CDF for \(X \sim U(2, 5)\) is show in figure Fig. 4.4. The code for a CDF function is listed in unifcdf.m.

Uniform distribution PDF and CDF

Fig. 4.4 Uniform distribution PDF and CDF

\[\begin{split}f(x) = \left\{ \begin{array}{ll} \frac{1}{b - a}, &a \leq x \leq b \\ 0, &\text{otherwise} \end{array} \right.\end{split}\]
\[\begin{split}F(c) = \left\{ \begin{array}{ll} 0, &c < a \\ \frac{c - a}{b - a}, &a \leq c \leq b \\ 1, &c > b \end{array} \right.\end{split}\]
\[E[X] = \frac{a + b}{2}\]
\[Var[X] = \frac{(b - a)^2}{12}\]
function F = unifcdf(c, varargin)
% UNIFCDF - CDF of continuous uniform distribution U(a, b)
%
%  Inputs: c - variable in question -- may be a scalar or row vector
%          a - minimum value of distribution
%          b - maximum value of distribution
%        unifcdf(c) is the same as unifcdf(c, 0, 1)

    if length(varargin) == 0
        a = 0;
        b = 1;
    else
        a = varargin{1};
        b = varargin{2};
    end
    F = zeros(1, length(c));
    inRange = (c >= a) & (c <= b);
    F(inRange) = (c(inRange) - a)/(b - a);
    F(c > b) = 1;
end

Exponential random variable

Exponential random variables model non-negative data where zero or close to zero values are common, and the probability decreases as the value gets larger. Small values are more likely than larger values. For example, the total value of the coins in a person’s pocket has an exponential distribution. In such an experiment, many people will not have any coins in their pockets, and only a few people will have pocket change of more than a dollar. The exponential distribution describes random variables that are lengths of time, such as the time between events or the duration of events. Example PDF and CDF plots for the exponential distribution are shown in figure Fig. 4.5. We might measure the time between customers entering a store or how long it takes to help each customer.

The exponential distribution is often used with the Poisson distribution to predict capacity requirements. For example, in the telecommunications industry, the Poisson distribution models the expected number of telephone calls to be handled by a switching system per hour, while the exponential distribution models the length of each call. The statistical models for the number and length of calls are used to determine the equipment’s call handling capacity requirements. A similar analysis could prescribe the number of employees needed at a business.

\[\begin{split}f(x) = \left\{ \begin{array}{ll} \lambda\,e^{-\lambda\,x}, &x > 0 \\ 0, &x \leq 0 \end{array} \right.\end{split}\]
\[\begin{split}F(a) = \left\{ \begin{array}{ll} 1 - e^{-\lambda\,a}, &a > 0 \\ 0, &a \leq 0 \end{array} \right.\end{split}\]
\[E[X] = \frac{1}{\lambda}\]
\[Var[X] = \frac{1}{\lambda^2}\]
Exponential distribution

Fig. 4.5 Exponential distribution

Normal random variable

The normal distribution, also called the Gaussian distribution, models random variables that naturally occur in nature and society. Random variables with a normal distribution include measurements (length, width, height, volume, weight, etc.) of plants and animals, scores on standardized tests, income levels, air pollution levels, etc. Its probability density function has the familiar bell-shaped curve. We denote the distribution as \(X \sim N(\mu, \sigma^2)\).

\[f(x) = \frac{1}{\sqrt{2\,\pi\,\sigma^2}} e^{\frac{-(x - \mu)^2}{2\,\sigma^2}}\]

To simplify things, we will map the random variable to a standard normal distribution with zero mean and unit variance, \(Y \sim N(0, 1)\).

\[y = \frac{x - \mu}{\sigma}\]
\[f(y) = \frac{1}{\sqrt{2\,\pi}} e^{\frac{-y^2}{2}}\]

Computing the CDF requires numerical integration because the PDF has no closed-form integral solution. Fortunately, we can use a built-in MATLAB function to compute the integral. See the documentation for functions erf and erfc. The Statistics and Machine Learning Toolbox includes a function to compute the CDF, but implementing a function using erfc is not difficult.

We desire a function called normcdf that behaves as follows for distribution \(Y \sim N(0, 1)\). The needed area calculation is plotted in figure Fig. 4.6. A plot of the CDF is shown in figure Fig. 4.7.

The shaded area of a normal distribution PDF is P(Y < a).

Fig. 4.6 The shaded area is P(Y < a).

>> a = linspace(-3,3);
>> plot(a, normcdf(a))
CDF for the standard normal distribution

Fig. 4.7 CDF for \(X ~ N(0, 1)\)

\[\text{normcdf(a)} = P(Y < a) = F(a) = \frac{1}{\sqrt{2\,\pi}} \int_{-\infty}^a e^{\frac{-y^2}{2}} \: dy\]

The erfc(b) function gives us the following definite integral.

\[\text{erfc(b)} = \frac{2}{\sqrt{\pi}} \int_b^\infty e^{-y^2} \: dy.\]

So, if we multiply by \(1/2\), change the sign of the definite integral boundary value to reflect that we are integrating from the boundary to infinity rather than from negative infinity to the boundary, and divide the boundary-value by \(\sqrt{2}\) because the squared integration variable in the erfc calculation is not divided by 2 as we need. The integration will then compute an equivalent area to what we need. The code for a normal distribution CDF is listed in normcdf.m.

Note that because of the symmetry of the PDF curve about the mean, \(P(Y < a)\) is the same as \(P(Y > -a)\).

\[\text{normcdf(a)} = F(a) = \frac{1}{2} \left( \frac{2}{\sqrt{\pi}} \int_{-a/\sqrt{2}}^\infty e^{-y^2} \: dy \right) = \frac{1}{2} \text{erfc}(-a/\sqrt{2})\]
function F = normcdf(a, varargin)
% NORMCDF - CDF for normal distribution
%   Inputs: a  - variable in question
%              - may be a scalar or row vector
%           mu - mean of distribution
%           sigma - standard deviation
%      normcdf(a) is the same as normcdf(a, 0, 1)
%
%   Output: Probability P(X < a)

    if length(varargin) ~= 0
        mu = varargin{1};
        sigma = varargin{2};
        a = (a - mu)./sigma;
    end
    F = erfc(-a/sqrt(2))/2;
end
Probability Example:

You learn that calves that are to be auctioned at the local livestock sale have a mean weight of 500 pounds with a standard deviation of 150 pounds. What fraction of the calves likely weigh:

  1. Less than 400 pounds?

  2. More than 700 pounds?

  3. Between 450 and 550 pounds?

>> % less than 400 pounds
>> less400 = normcdf(400, 500, 150)
less400 =
    0.2525

>> % more than 700 pounds
>> more700 = 1 - normcdf(700, 500, 150)
more700 =
    0.0912

>> % between 450 and 550 pounds
>> middle100 = normcdf(550, 500, 150) - normcdf(450, 500, 150)
middle100 =
    0.2611