14.17. PCA Data ClassificationΒΆ

An important application of PCA is classification, which reduces the dimensions of the data for the purpose of making it easier to see how the attributes of items of the same type are similar and differ from items of other types. The result of the analysis is often a plot showing the data in the PCA space.

In this assignment, we will look at a public domain dataset and produce a scatter plot showing how groups of the data are clustered in the PCA space. Refer to the example in PCA for Clustering as a guide to see how the PCA algorithm is applied.

Iris Plant Dataset

This is a quite old dataset. It contains four attributes of three types of Iris flower plants. The data was collected by R.A. Fisher for a paper published in 1950. Information about the dataset is found on the UCI Machine Learning Repository.

Use PCA to reduce the dimensionality of the data from four attributes to two principal components. Make a scatter plot of the samples in two dimensional PCA space.

iris_data.txt

Submit both your MATLAB script and PNG image file of your scatter plot.