Featured Image

If Something Is True, Does It Mean It's Important? Understanding Statistical Significance

The statistical perspective of significance should not be confused with the practical sense of significance. Consider the difference between something having strategic importance versus something being statistically significant.
Nov 18, 2021

The statistical perspective of significance should not be confused with the practical sense of significance. Consider the difference between something having strategic importance versus something being statistically significant. Statistical significance means that there is enough evidence to suggest that the relationship observed in the collected sample also exists in the broader population. In other words, the effect is not due to chance. 

What does statistical significance involve?

In an experiment, information is usually extrapolated based on a representative sample. Since every possible data point in a population is not included, there will naturally be sampling error.  

For example, assume there are a set of cohorts within a membership program. The success of a membership initiative is being measured based on a sample taken from each cohort. The measurement being gauged may appear higher for one cohort than the others. However, the sample drawn may have not sufficiently portrayed the population for that cohort.

Variation in the original population as well as the sample size contribute to sampling error. The effect of sampling error increases with smaller samples. Generally, with larger samples, statistical significance is less likely to be based on randomness.

With a more varied population, the confidence in the findings being statistically significant decreases. When the data is more widely dispersed from its mean as shown in the red distribution in Figure 2, there is more variation and therefore higher sampling error. Based on the red distribution, the amount of research requested by membership differs more. 

With the narrower distribution shown in black, it can be assumed that most members have around the same number of research requests. The confidence in the findings is greater since the data is not as scattered. Also, in this case, the sample most likely better resembles the underlying population.

How is statistical significance determined?

Determining statistical significance involves establishing a null hypothesis. A null hypothesis is a statement initially assumed to be true. Regarding the red and black distributions above, a null hypothesis may be that “There isn’t a difference in the average number of research requests for the two populations.” You are trying to determine if this null hypothesis is false. 

An alternative hypothesis should also be established. This is a statement that you are trying to prove. Given the above distributions, an alternative hypothesis might be that “There is a difference in the average number of research requests for the two populations.”

Another component to assessing statistical significance is the significance level, a threshold for understanding if the null hypothesis should be rejected. The significance level commonly used is 0.05, although other values can be used. There isn’t a singular threshold value that always confirms statistical significance.

A probability known as the p-value is generated from an applicable statistical test and is compared against this threshold value. If the p-value is smaller than the threshold, the null hypothesis is rejected, indicating a significant result and that the result is less likely random. With a smaller p-value, there is greater evidence that the null hypothesis is false. If the p-value is larger than the threshold, the result is considered non-significant, and the null hypothesis is not rejected.

What other factors surround statistical significance?

Non-sampling error occurs as well when samples are used to generalize about a population. This includes the bias that will potentially exist with factors such as poorly worded survey questions, ill-suited sampling methods, or low response rates. While p-values produced from statistical tests help rationalize sampling error, quantifying non-sampling error poses more of a challenge. Minimizing non-sampling error involves structuring the analysis as such to validate the results. This may involve introducing an element into the design that will reduce the effect of the error.

Confidence intervals are tied to significance levels and are affected by variation and sample size. They convey how accurate a calculated statistic is likely to be. They are wider for a population that is more varied and narrower with bigger samples. As an example, a 95% confidence interval indicates that 95 intervals will include the true population value and 5 will not for every 100 calculated confidence intervals from the sample.

What are a few particulars on statistical significance?

It is possible to have statistically significant results that have a minimal effect where the results are not important. A small p-value does not necessarily imply importance. When a finding is statistically significant, it is unlikely due to plain luck. Statistical significance should not solely be used to interpret whether an impact is meaningful.

Nina Anderson
Data Scientist
Recent intelligence News
Jan de Nijs oversees Lockheed Martin’s manufacturing production data collection and management at the F-35 plant in Ft. Worth, Texas and is team leader within the Lockheed Martin Digital Transformation Program. In 2019, he was awarded the prestigious...
IHS Markit compiles data from the Purchasing Managers’ Index (PMI) for more than 40 economies worldwide. Monthly reports are derived from survey data collected from senior executives at private sector companies. This month, private sector firms in the...
The University of Michigan’s Consumer Confidence Index fell from 101 in February to 72 in April. University analysts state that a collapse in confidence stemmed from concerns around personal finances and the national economy – both related to fallout...
The two best places to build a deep understanding of the manufacturing technology eco-system are IMTS and EMO. These past couple weeks, EMO has been the center of the manufacturing world as innovators, builders, customers, investors, and enthusiasts...
Elsewhere in this edition of AMT News, there are summaries and articles on my outlook presented to IMTS exhibitors, Alan Beaulieu’s outlook presented at EASTEC, and Bill Strauss’s update presented to the MTForecast community...
Similar News
By John Turner | Sep 01, 2022

Advanced cybersecurity plans should include functionality for logging every attempt to access the network or critical areas on the network to protect business assets or as required for legal or contractual requirements. Read on to learn what that involves.

5 min
By John Turner | Jun 03, 2022

Access control in an advanced cybersecurity plan go well beyond usernames and passwords. It means defining, implementing, and monitoring rules to control which persons and systems may access resources within your company’s network and computer systems.

5 min
By John Turner | Jan 26, 2023

Any cybersecurity implementation involves a trade-off between a company’s tolerance for risk and the effort and costs associated with protecting the company’s resources and customers. Learn how to assess risk and test for vulnerabilities in your network.

4 min