Featured Image

If Something Is True, Does It Mean It's Important? Understanding Statistical Significance

The statistical perspective of significance should not be confused with the practical sense of significance. Consider the difference between something having strategic importance versus something being statistically significant.
Nov 18, 2021

The statistical perspective of significance should not be confused with the practical sense of significance. Consider the difference between something having strategic importance versus something being statistically significant. Statistical significance means that there is enough evidence to suggest that the relationship observed in the collected sample also exists in the broader population. In other words, the effect is not due to chance. 

What does statistical significance involve?

In an experiment, information is usually extrapolated based on a representative sample. Since every possible data point in a population is not included, there will naturally be sampling error.  

For example, assume there are a set of cohorts within a membership program. The success of a membership initiative is being measured based on a sample taken from each cohort. The measurement being gauged may appear higher for one cohort than the others. However, the sample drawn may have not sufficiently portrayed the population for that cohort.

Variation in the original population as well as the sample size contribute to sampling error. The effect of sampling error increases with smaller samples. Generally, with larger samples, statistical significance is less likely to be based on randomness.

With a more varied population, the confidence in the findings being statistically significant decreases. When the data is more widely dispersed from its mean as shown in the red distribution in Figure 2, there is more variation and therefore higher sampling error. Based on the red distribution, the amount of research requested by membership differs more. 

With the narrower distribution shown in black, it can be assumed that most members have around the same number of research requests. The confidence in the findings is greater since the data is not as scattered. Also, in this case, the sample most likely better resembles the underlying population.

How is statistical significance determined?

Determining statistical significance involves establishing a null hypothesis. A null hypothesis is a statement initially assumed to be true. Regarding the red and black distributions above, a null hypothesis may be that “There isn’t a difference in the average number of research requests for the two populations.” You are trying to determine if this null hypothesis is false. 

An alternative hypothesis should also be established. This is a statement that you are trying to prove. Given the above distributions, an alternative hypothesis might be that “There is a difference in the average number of research requests for the two populations.”

Another component to assessing statistical significance is the significance level, a threshold for understanding if the null hypothesis should be rejected. The significance level commonly used is 0.05, although other values can be used. There isn’t a singular threshold value that always confirms statistical significance.

A probability known as the p-value is generated from an applicable statistical test and is compared against this threshold value. If the p-value is smaller than the threshold, the null hypothesis is rejected, indicating a significant result and that the result is less likely random. With a smaller p-value, there is greater evidence that the null hypothesis is false. If the p-value is larger than the threshold, the result is considered non-significant, and the null hypothesis is not rejected.

What other factors surround statistical significance?

Non-sampling error occurs as well when samples are used to generalize about a population. This includes the bias that will potentially exist with factors such as poorly worded survey questions, ill-suited sampling methods, or low response rates. While p-values produced from statistical tests help rationalize sampling error, quantifying non-sampling error poses more of a challenge. Minimizing non-sampling error involves structuring the analysis as such to validate the results. This may involve introducing an element into the design that will reduce the effect of the error.

Confidence intervals are tied to significance levels and are affected by variation and sample size. They convey how accurate a calculated statistic is likely to be. They are wider for a population that is more varied and narrower with bigger samples. As an example, a 95% confidence interval indicates that 95 intervals will include the true population value and 5 will not for every 100 calculated confidence intervals from the sample.

What are a few particulars on statistical significance?

It is possible to have statistically significant results that have a minimal effect where the results are not important. A small p-value does not necessarily imply importance. When a finding is statistically significant, it is unlikely due to plain luck. Statistical significance should not solely be used to interpret whether an impact is meaningful.

PicturePicture
Author
Nina Anderson
Data Scientist
Recent intelligence News
Missed AMT’s MTForecast? Here’s some insight on the double-digit growth rate of additive manufacturing (AM) and surprising shifts from AMT Analyst Mark Huber.
Does a year-over-year percentage growth truly reflect industry change? Do increased robot installations in sectors with less robot density look the same in ones with more density? Derivatives help expose how total change is affected when variables change.
Jan de Nijs oversees Lockheed Martin’s manufacturing production data collection and management at the F-35 plant in Ft. Worth, Texas and is team leader within the Lockheed Martin Digital Transformation Program. In 2019, he was awarded the prestigious...
IHS Markit compiles data from the Purchasing Managers’ Index (PMI) for more than 40 economies worldwide. Monthly reports are derived from survey data collected from senior executives at private sector companies. This month, private sector firms in the...
The University of Michigan’s Consumer Confidence Index fell from 101 in February to 72 in April. University analysts state that a collapse in confidence stemmed from concerns around personal finances and the national economy – both related to fallout...
Similar News
undefined
Technology
By Benjamin Moses | Nov 18, 2024

Episode 126: Steve immediately kicks it off with a listicle regarding the ten most disruptive 3D printers in history. The tech friends then discuss augmented reality glasses. Steve also reports that Georgia Tech has a replica of the AMT testbed.

30 min
undefined
Technology
By Douglas K. Woods | Oct 07, 2024

Change is happening faster than ever. With it comes opportunities – as well as potentially insurmountable challenges to the status quo.

6 min
undefined
Technology
By Stephen LaMarca | Oct 02, 2024

OpenUSD and USD refer to the same core technology, with OpenUSD emphasizing the framework's open-source nature.

4 min