4.1. Introduction to Statistics

What does the data tell us? Are our design decisions data-driven? You have probably heard these questions before. Unfortunately, asking these questions is much simpler than answering them. Identifying what data to consider is hard; distinguishing factual information from the noise and variability is even more complicated. Statistics gives us some tools that can help.

The first step is to find and collect our data. If we are concerned about our company’s products, then every product made is called the population. Collecting data from every item in the population is likely to be impractical, if not impossible. Collecting data from random samples of the population is more practical and usually gives sufficient information. Some exceptions allow us to consider an entire population. For example, every hydroelectric power generating facility in a country built during the last 50 years defines a significant but manageable population. Or we may be able to consider a population when the population consists of results deriving from established rules of probability, such as the outcomes from a pair of thrown dice. As we define and use statistics, we will indicate with different symbols when we are using a population or sampled data.

A random variable is a number whose value is the outcome of an experiment, observation, or measurement influenced by unpredictable randomness. The probabilities that the random variable can take the given values from its domain defines its probability distribution. We will describe some standard distributions in Probability Distributions. Random variables can be either discrete or continuous. Discrete random variables take on discrete or countable values. A continuous random variables can be uncountable real numbers.

Knowing our data’s probability distribution, we can calculate and use statistical parameters to provide the information needed to make inferences, conclusions, and ultimately decisions. We will need probability estimates regarding our population. The tools that will allow us to determine probabilities are the probability mass function (PMF) for discrete random variables and the probability density function (PDF) for continuous random variables. The PMF defines the chances for the random variable to take each possible value. So to calculate a probability, we will find the sum of terms from the PMF. The PDF defines the likelihood of the random variable taking a value within a specified range of numbers. Thus, we use definite integrals of the PDF to find probabilities for continuous random variables. An example of two PDF plots is shown in Fig. 4.1.