2.10. Importing Data Into MATLAB

Data files having several formats, such as TXT, CSV, XLS, and XLSX, can be imported into MATLAB both interactively and from commands. The latest list of supported import and export file formats is available on MathWorks’ website [FileFmts].

Search MathWorks’ website for videos and examples for importing spreadsheet data into MATLAB. In particular, the video at [Eigerman] and examples at [Quintero] are quite helpful.

2.10.1. Saving and Loading Workspace Data

You can save data from the workspace and load data into the workspace using the save and load commands. These commands use files with a .mat file name extension.

Create myData.mat containing current workspace variables.

>> save myData

You can selectively save individual variables instead of saving the entire MATLAB workspace. Save x and y to someFile.mat.

>> save someFile x y

Load variables from myData.mat into the workspace.

>> load myData

2.10.2. Import Tool

The Import Tool is convenient for importing data from files with an intuitive graphical user interface. If the file has labels in the first row for each column, they will appear at the top of each column. You can also enter label names, see figure Fig. 2.7. The labels are used for naming the column vectors or table fields.

The Import Tool is available from the Home tab, labeled as “Import Data”. Notice the pull-down menu specifying “Output Type”. The choice of “Column vectors” yields similar vector data as we have previously used. Each column of data from the file is imported as a column vector.

Notice the two choices in the pull-down menu on the right side of the Import Tool labeled “Import Selection”. Options “Generate Script” and “Generate Function” can automate future imports.

Notice also how missing data will be handled. A pull-down menu offers choices. Replacing missing data with NaN (not a number) is often a good choice.

2.10.3. Reading Tables

We are most familiar with column vectors at this point, but it may not always be the best choice. Importing data as a table is simpler to perform from commands. The columns of a table can be converted to column vectors if needed. Each table column doesn’t need to be of the same data type. The data can be considered a series of observations (rows) over a set of variables (columns). The sortrows function is used to sort the data based on any table column, and the data in each row is kept together.

The command for importing a table from a file is readtable(’filename’). The columns of a table may be accessed as column vectors using the notation of tableName.columnName.

2.10.4. Dealing with Missing Data

Data values that are missing show up as NaN (Not a Number. The MATLAB function ismissing returns a logical vector showing where data is missing. There are three basic ways to deal with missing data.

  1. Ignore it: Some MATLAB statistics functions have an option ’omitnan’ that will ignore any NaN values.

  2. Delete it: Setting a table row to [\(\,\)] removes the data.

  3. Interpolate it: MATLAB has a function called fillmissing that will interpolate missing data in a vector between its neighboring data. Data Interpolation has more information about data interpolation.

The following tutorial shows example code for dealing with missing data. It uses the file autompg.csv, which is a public domain dataset available from the UCI Machine Learning Repository [Dua:2019] as file auto-mpg.data-original. The file contains six data values about 398 car models from 1970 to 1982. For our investigation, we will import only the data columns for miles per gallon (mpg) in column 1, horsepower in column 4, and weight in column 5. The most convenient way to import the data into MATLAB is to use the Import Data Tool. Figure Fig. 2.7 shows how column names are entered at the top of each column and the desired columns are selected for import. The data will be imported into a table, and any missing values will be replaced with NaN (not a number).

The Import Data Tool is used to import desired data columns.

Fig. 2.7 The Import Data Tool is used to import desired data columns.

After the data is imported into a table, we verify the names of the table fields. We will also save the data to make it easier to load later. The saved file is autompg.mat.

>> autompg.Properties.VariableNames
ans =
  1x3 cell array
    {'mpg'}    {'horsepower'}    {'weight'}
>> save autompg

We will put our commands into a script, but use the “Run Section” button from the Edit tab to take things one step at a time.

First, we will check the three data fields to see if any data is missing. The ismissing function will return a logical array, and the nnz function will count how many values are missing.

% File: carTable.m
%% Dealing with missing data
%   Run this code one section at a time.

% Take a look at the part of the table
% autompg(1:4,:)

% count missing data
disp('missing MPG')
nnz(ismissing(autompg.mpg))
disp('missing Horsepower')
nnz(ismissing(autompg.horsepower))
disp('missing Weight')
nnz(ismissing(autompg.weight))

The horsepower field has six missing values.

missing MPG
ans =
     0
missing Weight
ans =
     0
missing Horsepower
ans =
     6

If we want to calculate statistics of the data, we can ignore the missing data with the ’omitnan’ option. Most of the statistics functions have this option. For more on statistics functions, see Common Statistical Functions and check the documentation of the functions you want to use.

%% Ignore missing data for mean

disp('mean Horsepower')
mean(autompg.horsepower, 'omitnan')

We may want to delete the rows with missing values. The following code also displays how many rows were deleted.

%% Delete rows with missing data

%  idx holds index of any rows with missing (NaN) values
idx = ismissing(autompg.horsepower);
autompg(idx,:) = [];    % remove rows missing values
disp('missing Horsepower')
nnz(idx)
missing Horsepower
ans =
     6

We would expect a negative correlation between horsepower and mpg values. A quick scatter plot verifies this. We could use MATLAB’s corrcoef function to calculate the correlation, but a scatter plot verifies what we already expect. We only need to see the plot to verify the relationship, so we didn’t bother to label the axes or give the plot a title. Figure Fig. 2.8 shows the scatter plot.

Scatter plot of car horsepower against mpg

Fig. 2.8 Scatter plot of car horsepower against mpg

%% scatter plot to observe correlation between MPG
%  and Horsepower

scatter(autompg.horsepower, autompg.mpg)

Next, after reloading the data, we will use linear data interpolation to fill in the missing data. Since we know that the horsepower data is correlated to the mpg data, we can get reasonable results by sorting the data based on the mpg field. The long output (not shown) shows the values before and after the filled-in values and verifies that the estimates are reasonable.

Whether it is better to delete rows with missing values or use interpolation depends on the application, the data available, and your preferences.

%% Interpolate missing data
% re-load data with missing values
load autompg

% sort rows by MPG to group cars.
autompg = sortrows(autompg, 'mpg');
% find missing data
idx = find(ismissing(autompg.horsepower));
disp('missing data')
for i = idx'
    disp(autompg.horsepower(i-2:i+2))
    disp(' ')
end

% use linear interpolation
autompg.horsepower = ...
    fillmissing(autompg.horsepower, 'linear');
disp('interpolated data')
for i = idx'
    disp(autompg.horsepower(i-2:i+2))
    disp(' ')
end

2.10.5. Exporting Table Data

Use the writetable function to export a table to a delimited text file or a spreadsheet.

>> writetable('myTable.csv');

By default, writetable uses the file name extension to determine the appropriate output format.

Note

There is a lot more information about MATLAB tables that are not included here.

Now complete the tutorial in Working with Tables.