4.7. Plots of Statistical Data¶
Here, we generate a data set of 200 random numbers to illustrate two standard plots that show the data distribution. The data was generated with a normal distribution random number generator.
d = 50 + 15*randn(1, 200); % normal mean=50, std=15
4.7.1. Box Plot¶
A box plot gives us a quick picture of the distribution of the data. The plot makes it easy to see the range of each 25% of the data (called quartiles). A box plot example is shown in figure Fig. 4.14. The vertical line at the bottom of the plot shows the lower limit value and extends up to the bottom of a box showing the first quartile, \(Q_1\). The box in the center represents the range of the middle 50% of the data. It goes from the first quartile, \(Q_1\), to the third quartile, \(Q_3\). There is a horizontal line at the second quartile, which is the median of the data. Then there is another vertical line from the third quartile to the upper limit value. The lower and upper limits may be the minimum and maximum values of the data. However, they are often found relative to the center region of the data so that outliers are excluded from the four quartiles. The range of the center region is called \(IQR\), \(IQR = Q_3 - Q_1\). The lower and upper limit values (\(LL\) and \(UL\)) are computed as \(LL = Q_1 - 1.5{\times}IQR\) and \(UL = Q_3 + 1.5{\times}IQR\). Any data points less than \(LL\) or greater than \(UL\) are classified as outliers, and may appear as scatter points in the box plot.
>> boxplot(d')

Fig. 4.14 A Box Plot with Quartiles Noted¶
The box plot function from MathWorks is part of the Statistics and
Machine Learning Toolbox, and a few free box plot functions are
available on the MathWorks File Exchange. Some of those use functions
from extra toolboxes. However, the free boxplot
function [LUENGO15] is a simple function that uses
only standard MATLAB functions.
The boxplot
function wants the data to be in a column vector because
it can make several box plots in a figure if each data set is a matrix
column.
4.7.2. Histogram¶
A histogram plot divides the data into regions (called bins) and shows how many values fall into each region. If the data size is large, a histogram plot will begin to take the shape of the PDF.
The histogram function has several possible parameters, but the most
common usage is to pass two arguments—the data and the number of bins to
use. Another useful pair of options is
’Normalization’, ’probability’
, which scales the height of each bin
to its probability level, making a convenient overlay with another
histogram or PDF plot. An example histogram is shown in figure
Fig. 4.15.
histogram(d, 40)

Fig. 4.15 A Histogram Plot¶