4.6. Statistics on Matrices¶
4.6.1. Column Statistics¶
When a matrix is passed to a statistical function, the default is to work on each column of the matrix independently. The result is a row vector, which may in turn be used with other statistical functions if desired.
Some functions return multiple values for each column. For example, the
diff
function calculates the difference between adjacent elements in
each column of the input matrix. The cumsum
function also returns a
matrix rather than a row vector.
4.6.2. Changing Dimension¶
Many statistical functions accept an optional dimension argument that specifies whether the operation should be applied to the columns independently (the default) or to the rows.
If matrix has size , then mean(A,2)
returns a 10 element column vector; whereas, mean(A)
returns a 20 element
row vector.
>> A = randi(25,10,20);
>> mC = mean(A);
>> mr = mean(A,2);
>> whos
Name Size Bytes Class Attributes
A 10x20 1600 double
mC 1x20 160 double
mr 10x1 80 double
Some functions, such as min
, max
, and diff
, use the second argument
for other purposes, which makes dimension the third argument. To
skip the second argument, use a pair of empty square brackets for an empty
vector, []
.
>> Amin = min(A,[],2);
4.6.3. Covariance and Correlation¶
Covariance shows how distinct variables relate to each other. It is calculated in the same manner that variance is calculated for a single variable. Variance (square of the standard deviation) is the expected value of the squared difference between each sample and the mean of the variable. Note that covariance and correlation can be defined either in terms of populations or samples. We mostly use the notion for sampled data.
Similarly, the covariance between two variables is the product of the differences between samples and their respective means.
Thus, the covariance between a variable and itself is its variance, . Covariance is represented with a symmetric matrix because . The variances of each variable will be on the diagonal of the matrix.
For example, consider taking a sampling of the age, height, and weight of children. We could construct a covariance matrix as follows.
The correlation coefficient of two variables is a measure of their linear dependence.
A matrix of correlation coefficients has ones on the diagonal since variables are directly correlated to themselves. Correlation values near zero indicate that the variables are mostly independent of each other, while correlation values near one or negative one indicate positive or negative correlation relationships. Correlation coefficients are generally more useful than covariance values because they are scaled to always have the same range ().
The MATLAB functions to compute the covariance and correlation coefficient
matrices are cov
and corrcoef
. In the following example, matrix
has 100 random numbers in each of two columns. Half of the value
of the second column come from the first column and half come from another
random number generator. The variance of each column is on the diagonal of the
covariance matrix. The covariance between the two columns is on the
off–diagonal. The off–diagonal of the matrix of correlation coefficients
shows that the two columns have a positive correlation.
>> A = 10*randn(100, 1);
>> A(:,2) = 0.5*A(:,1) + 5*randn(100, 1);
>> Acov = cov(A)
Acov =
94.1505 50.0808
50.0808 50.6121
>> Acorr = corrcoef(A)
Acorr =
1.0000 0.7255
0.7255 1.0000