.. _CLT: Central Limit Theorem ======================== We know that many random variables naturally fit the normal distribution model. It turns out that random variables from other distributions can be mapped to a normal distribution by the *central limit theorem*. From a population that has mean :math:`\mu` and variance :math:`\sigma^2`, draw :math:`m` sampling sets, each of size :math:`n`. The central limit theorem says that when :math:`n` is large, the distribution of the sample means and sample sums is approximately normal regardless of the underlying population distribution. For each sampling of random variable :math:`X`, :math:`X_i`, let :math:`\bar{X_i}` be the sample mean, and let :math:`Y_i` be the sample sum. .. math:: \begin{array}{ll} \bar{X_i} &= \frac{1}{n} \sum_{j = 1}^n x_{i,j} \\ \\ Y_i &= \sum_{j = 1}^n x_{i,j} \end{array} We also define variable :math:`Z` as follows. .. math:: Z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} = \frac{Y - n\mu}{\frac{\sigma}{\sqrt{n}}} Then we can define normal distributions from :math:`\bar{X}`, :math:`Y`, and :math:`Z`. .. math:: \bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right), \mbox{ and } Z \sim N(0, 1) Let's see this in action. We will start with, :math:`X` representing the six sides of a die, so we use a discrete uniform distribution. We will put the data in a :math:`100{\times}100` matrix so that each column will be a sampling. Then we can find the mean and sum of each column to get new random variables with normal distributions. :: >> n = 100; >> X = randi(6, n); % 100 x 100 >> X_bar = mean(X); % 1 x 100 >> mu = mean(X(:)) mu = 3.4960 % 3.5 expected >> sigma = std(X(:)) sigma = 1.7025 % 35/12 = 2.92 expected % Make Z ~ N(0, 1) >> Z = (X_bar - mu)/(sigma/sqrt(n)); >> mean(Z) ans = -1.9895e-15 >> var(Z) % Z ~ N(0,1) ans = 0.9791 .. _fig:CLTplot: .. figure:: CLTplot.png :align: center :width: 60% Histograms of :math:`X \sim U(1, 6)` and :math:`Z \sim N(0, 1)`.