4.10. Statistical Significance¶
An observation from an experiment becomes an accepted fact when the data proves statistical significance. That is, it must be proven that the observation is consistently true and not a random anomaly.
Let us consider a specific historical example. In the nineteenth century, lung cancer was a rare disease. Then, during the first half of the twentieth century, the smoking of tobacco products, especially cigarettes, grew in popularity. According to Gallup polling data, self-reported adult smoking in the U.S. peaked in 1954 at 45%. Medical doctors noticed that as smoking rates increased, so did occurrences of lung cancer. But observations were not enough to conclude that smoking was causing lung cancer. Some studies in the 1940s and 1950s pointed blame toward tobacco smoking, but the public was skeptical because so many were smokers, and the tobacco industry claimed that smoking was safe. Then from 1952 to 1955, E. Cuyler Hammond and Daniel Horn, scientists from the American Cancer Society, conducted an extensive study using over 187,000 male volunteers [HAMMOND54]. They showed that the rate of lung cancer among smokers is outside the range of what is statistically possible for nonsmokers.
The strategy is to show that a null hypothesis must be rejected since it is false. A null hypothesis, represented as \(H_0\), claims that the observation in question falls within the range of regular statistical occurrences. It usually advocates for the status quo belief. So a null hypothesis might be: “Smokers of tobacco products and nonsmokers are equally likely to develop lung cancer.” Whereas, an alternate hypothesis, \(H_a\), might be: “Smoking tobacco products increases the risk of developing lung cancer.” The alternate hypothesis often promotes belief in a causal effect. It proposes that when some treatment is applied, those receiving it are changed. Example treatments and changes include: smoking leads to lung cancer, taking a medication heals people from a disease, or exercising reduces cholesterol. It is usually harder to directly show that the alternate hypothesis is true than to show that the null hypothesis is false. The alternative hypothesis must be accepted if we can show that the null hypothesis is wrong.
To build a case for rejecting a null hypothesis, we develop the statistics of the population not receiving the treatment in question and then show that the statistic of the treatment population falls outside the range of possibilities for the null hypothesis to be true. So Hammond and Horn established the statistics for nonsmokers developing lung cancer and then showed that smokers develop lung cancer at a rate beyond what is possible for nonsmokers.
The tools that statisticians use to establish confidence intervals for population means and accept or reject null hypotheses with statistical significance require a normal distribution. When the data has a different probability distribution, it is usually possible to frame the problem as an instance of a normal distribution by taking advantage of the central limit theorem.
Hypothesis testing estimates whether two data sets likely came from the same population. We accept the null hypothesis if they likely came from the same, or equivalent, source, and reject the null hypothesis if we conclude that they are from different sources.
If the standard deviation of the null case population is known, use the Z-test with the \(Z\)-critical values. When the standard deviation is unknown, use the t-test, which uses the sample standard deviation and is based on the t-distribution, which is similar to the normal distribution except with more variability.
4.10.1. Z-Test¶
For the Z-test, we only need to calculate the sample mean of the \(H_a\) data and then calculate the \(Z\) variable that maps the sample mean of the treatment data set to a standard normal distribution.
The \(Z\) variable is compared to a \(Z\)-critical value to determine if the \(H_a\) sample mean is consistent with the \(H_o\) population.
4.10.2. t-Test¶
In practice, the t-test is much like the Z-test except the sample’s calculated standard deviation (\(s\)) is used rather than the population standard deviation (\(\sigma\)). Because each sample from the population will have a slightly different standard deviation, we use the student’s t-distribution, which has a slightly wider bell-shaped PDF than the standard normal PDF. The distribution becomes wider for smaller sample sizes (degrees of freedom). The t-distribution approaches the standard normal distribution as the sample size approaches infinity. The degrees of freedom (\(df\)) are one less than the sample size (\(df = n - 1\)). The t-distribution PDF plot for three \(df\) values is shown in figure Fig. 4.18.

Fig. 4.18 t-Distribution PDF plots showing two-tailed 95% critical values for infinity, 9, and 3 degrees of freedom.¶
The \(T\) statistic is calculated and compared to critical values to test for acceptance of a null hypothesis, as was done with the Z-test. The critical values are a function of the desired confidence level and the degrees of freedom (sample size). There are several ways to find the critical values. Tables on websites or in statistics books list the critical values for the t-distribution. Interactive websites, as well as several software environments, can calculate critical values. MATLAB requires the Statistics and Machine Learning Toolbox to calculate critical values. However, as with the Z-test, the \(t\)-critical values are constants, so they may be hard-coded into a program to test a null hypothesis.
4.10.3. Example: Counterfeit Competition¶
As an example, consider the following scenario. Suppose that our company makes a product that is very good at what it does. A new competitor introduced a product that it claims is just as good as your product, but is cheaper. We want to prove that the competitor’s product is not as good as ours. So we draw sample sets of our product and the competitor’s product to establish the quality of each, which we will verify with a Z-test and a t-test.
Counterfeit Competition: Z-Test
For the sake of our example, let’s say that the metric of concern is a Bernoulli event. We obtained 500 of our competitors’ products and our product for testing. We split the 500 products into 10 sampling sets of 50 items.
We can use the Z-test for our product since we have a well-established success rate for it. In several tests, our company’s product has consistently shown a success rate of at least 90%. We want to verify that our latest product matches our established quality level. We put our successes and failures in a matrix and compute the mean of each column. According to the central limit theory, the means of the sample sets give us a normal distribution that we can use with both Z-tests and t-tests.
You may wonder about the binomial distribution, which comes from the sum of successes of Bernoulli events. A binomial distribution is a discrete facsimile of a normal distribution. The central limit theorem equations can scale and shift a binomial distribution into a standard normal distribution.
In the simulation test of our product, we find a Z-Test score that is well below any \(Z\)-critical value. So we confirm a null hypothesis that the samples match the established quality of our product.
>> X = rand(50, 10) < 0.9; % simulation setup
>> n = 10;
>> p = 0.9;
>> sigma = sqrt(p*(1-p)) % from Bernoulli equations
sigma =
0.3000
>> X_sample = mean(X);
>> x_bar = mean(X_sample)
x_bar =
0.9240
>> Z = (x_bar - 0.9)/(sigma*sqrt(n))
Z =
0.0253
Counterfeit Competition: t-Test
When we apply the t-test to our competitor’s product, we use statistics calculated from the data and our product’s established mean success rate. From a table of two-tailed \(t\)-critical values, we find that for \(\alpha = 0.05\) and \(df = 9\), \(t_{\alpha/2} = 2.262\). Although the success rate of our competitor’s product seems not to be far below that of our own, the calculated \(T\) is less than the negative of our \(t\)-critical value. So, we confirm with statistical significance that our competitor’s product is inferior to ours.
>> X = rand(50, 10) < 0.85;
>> n = 10;
>> X_bar = mean(X);
>> x_bar = mean(X_bar)
x_bar =
0.8620
>> s = std(X_bar)
s =
0.0382
>> T = (x_bar - 0.9)/(s/sqrt(n))
T =
-3.1425