4.8. Central Limit Theorem¶
We know that many random variables naturally fit the normal distribution model. It turns out that random variables from other distributions can be mapped to a normal distribution by the central limit theorem (CLT).
Theorem 4.1 (Central Limit Theorem)
From a population with mean \(\mu\) and variance \(\sigma^2\), draw \(m\) sampling sets, each of size \(n\). When \(n\) is large, the sample mean and sample sum distributions are approximately normal regardless of the underlying population distribution.
Statistics textbooks, such as by Hogg and Craig [HOGG78], contain rigorous proofs of the CLT.
For each sampling of the random variable \(X\), let \(\bar{X_i}\) be the sample mean, and let \(Y_i\) be the sample sum.
We also define variable \(Z\) as follows.
Then from the CLT, \(\bar{X}\), \(Y\), and \(Z\) have normal distributions.
Let’s see the CLT in action. We will start with \(X\) representing the six sides of a die, so we use a discrete uniform distribution. We will put the data in a \(100{\times}100\) matrix so that each column will be a sampling. Then we can find the mean and sum of each column to get a new random variable with a normal distribution. Normalized histogram plots of the data and a standard normal PDF are shown in Fig. 4.16.
>> n = 100;
>> X = randi(6, n); % 100 x 100
>> X_bar = mean(X); % 1 x 100
>> mu = mean(X(:))
mu =
3.4960 % 3.5 expected
>> sigma = std(X(:))
sigma =
1.7025 % 35/12 = 2.92 expected
% Make Z ~ N(0, 1)
>> Z = (X_bar - mu)/(sigma/sqrt(n));
>> mean(Z)
ans =
-1.9895e-15
>> var(Z) % Z ~ N(0,1)
ans =
0.9791

Fig. 4.16 Histograms of \(X \sim U(1, 6)\) and \(Z \sim N(0, 1)\).¶