4.1. Introduction to Statistics

What does the data tell us? Are our design decisions data-driven? You have probably heard these questions before. Unfortunately, asking these questions is much simpler than answering them. Identifying which data to consider is hard. Distinguishing factual information from the noise and stochastic variability is even more complicated. Statistics gives us some tools that can help.

The first step is to find and collect our data drawn from a population. If we are concerned about our company’s products, then the set of every product made is the population. Collecting data from every item in the population is likely to be impractical, if not impossible. Collecting data from random population samples is more practical and usually gives useful information. We can only consider an entire population when the population size is a manageable finite number. We may be able to consider a population derived from established rules of probability, such as the outcomes from a pair of thrown dice. As we define and use statistics, we must indicate with different symbols when calculating statistics for a population or sampled data.

random variable

A random variable is a number whose value is the outcome of an experiment, observation, or measurement influenced by unpredictable randomness.

The probability that the random variable takes values from a given range defines its probability distribution. We will describe some standard distributions in Probability Distributions. Random variables can be either discrete or continuous. A discrete random variable takes on a discrete or countable value. A continuous random variable can be any real number.

When we know our data’s probability distribution, we can calculate and use statistical parameters to provide the information needed to make inferences, conclusions, and ultimately decisions. The tools that will allow us to determine probabilities are the probability mass function (PMF) for discrete random variables and the probability density function (PDF) for continuous random variables. The PMF defines the chances for the random variable to take each possible value. We will find the sum of terms from the PMF to calculate a probability. The PDF defines the likelihood of the random variable taking a value within a specified range of numbers. Thus, we use definite integrals of the PDF to find probabilities for continuous random variables.

Figure Fig. 4.1 shows an example of two PDF plots.