4.8. Central Limit Theorem

We know that many random variables naturally fit the normal distribution model. It turns out that random variables from other distributions can be mapped to a normal distribution by the central limit theorem.

From a population that has mean \mu and variance \sigma^2, draw m sampling sets, each of size n. The central limit theorem says that when n is large, the distribution of the sample means and sample sums is approximately normal regardless of the underlying population distribution. For each sampling of random variable X, X_i, let \bar{X_i} be the sample mean, and let Y_i be the sample sum.

\begin{array}{ll}
  \bar{X_i} &= \frac{1}{n} \sum_{j = 1}^n x_{i,j} \\ \\
  Y_i &= \sum_{j = 1}^n x_{i,j}
\end{array}

We also define variable Z as follows.

Z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}
= \frac{Y - n\mu}{\frac{\sigma}{\sqrt{n}}}

Then we can define normal distributions from \bar{X}, Y, and Z.

\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right),
\mbox{  and  } Z \sim N(0, 1)

Let’s see this in action. We will start with, X representing the six sides of a die, so we use a discrete uniform distribution. We will put the data in a 100{\times}100 matrix so that each column will be a sampling. Then we can find the mean and sum of each column to get new random variables with normal distributions.

>> n = 100;
>> X = randi(6, n);  % 100 x 100
>> X_bar = mean(X);  % 1 x 100
>> mu = mean(X(:))
mu =
    3.4960           % 3.5 expected
>> sigma = std(X(:))
sigma =
    1.7025           % 35/12 = 2.92 expected

% Make Z ~ N(0, 1)
>> Z = (X_bar - mu)/(sigma/sqrt(n));
>> mean(Z)
ans =
   -1.9895e-15
>> var(Z)           % Z ~ N(0,1)
ans =
    0.9791
../_images/CLTplot.png

Fig. 4.14 Histograms of X \sim U(1, 6) and Z \sim N(0, 1).