.. _matrixStats: Statistics on Matrices ========================== .. include:: ../replace.txt Column Statistics ------------------ When a matrix is passed to a statistical function, the default is to work on each column of the matrix independently. The result is a row vector, which may in turn be used with other statistical functions if desired. .. figure:: columnMean.png :align: center :width: 15% Some functions return multiple values for each column. For example, the ``diff`` function calculates the difference between adjacent elements in each column of the input matrix. The ``cumsum`` function also returns a matrix rather than a row vector. .. figure:: columnDiff.png :align: center :width: 15% Changing Dimension ------------------- Many statistical functions accept an optional dimension argument that specifies whether the operation should be applied to the columns independently (the default) or to the rows. If matrix :math:`\bf{A}` has size :math:`10{\times}20`, then ``mean(A,2)`` returns a 10 element column vector; whereas, ``mean(A)`` returns a 20 element row vector. :: >> A = randi(25,10,20); >> mC = mean(A); >> mr = mean(A,2); >> whos Name Size Bytes Class Attributes A 10x20 1600 double mC 1x20 160 double mr 10x1 80 double Some functions, such as ``min``, ``max``, and ``diff``, use the second argument for other purposes, which makes dimension the third argument. To skip the second argument, use a pair of empty square brackets for an empty vector, ``[]``. :: >> Amin = min(A,[],2); .. _covariance: Covariance and Correlation --------------------------- .. index:: covariance, correlation **Covariance** shows how distinct variables relate to each other. It is calculated in the same manner that variance is calculated for a single variable. Variance (square of the standard deviation) is the expected value of the squared difference between each sample and the mean of the variable. Note that covariance and correlation can be defined either in terms of populations or samples. We mostly use the notion for sampled data. .. math:: \sigma_x^2 = E(X - \mu_x)^2 .. math:: s_x^2 = \frac{1}{n - 1} \sum_{i = 1}^n (x_i - \bar{x})^2 Similarly, the covariance between two variables is the product of the differences between samples and their respective means. .. math:: \sigma_{xy} = E[(X - \mu_x)(Y - \mu_y)] .. math:: S_{xy} = \frac{1}{n - 1} \sum_{i = 1}^n (x_i - \bar{x})(y_i - \bar{y}) Thus, the covariance between a variable and itself is its variance, :math:`s_{xx} = s_x^2`. Covariance is represented with a symmetric matrix because :math:`s_{xy} = s_{yx}`. The variances of each variable will be on the diagonal of the matrix. For example, consider taking a sampling of the age, height, and weight of :math:`n` children. We could construct a covariance matrix as follows. .. math:: \mathbf{Covariance}(a, h, w) = \mat{s_{aa}, s_{ah}, s_{aw}; s_{ha}, s_{hh}, s_{hw}; s_{wa}, s_{wh}, s_{ww}} The **correlation coefficient** of two variables is a measure of their linear dependence. .. math:: r_{xy} = \frac{s_{xy}}{s_x s_y} A matrix of correlation coefficients has ones on the diagonal since variables are directly correlated to themselves. Correlation values near zero indicate that the variables are mostly independent of each other, while correlation values near one or negative one indicate positive or negative correlation relationships. Correlation coefficients are generally more useful than covariance values because they are scaled to always have the same range (:math:`-1 \leq r \leq 1`). .. math:: \mathbf{R}(a, h, w) = \mat{1, r_{ah}, r_{aw}; r_{ha}, 1, r_{hw}; r_{wa}, r_{wh}, 1} The |M| functions to compute the covariance and correlation coefficient matrices are ``cov`` and ``corrcoef``. In the following example, matrix :math:`\bf{A}` has 100 random numbers in each of two columns. Half of the value of the second column come from the first column and half come from another random number generator. The variance of each column is on the diagonal of the covariance matrix. The covariance between the two columns is on the off--diagonal. The off--diagonal of the matrix of correlation coefficients shows that the two columns have a positive correlation. :: >> A = 10*randn(100, 1); >> A(:,2) = 0.5*A(:,1) + 5*randn(100, 1); >> Acov = cov(A) Acov = 94.1505 50.0808 50.0808 50.6121 >> Acorr = corrcoef(A) Acorr = 1.0000 0.7255 0.7255 1.0000