The statistical perspective of significance should not be confused with the practical sense of significance. Consider the difference between something having strategic importance versus something being statistically significant. Statistical significance means that there is enough evidence to suggest that the relationship observed in the collected sample also exists in the broader population. In other words, the effect is not due to chance.
What does statistical significance involve?
In an experiment, information is usually extrapolated based on a representative sample. Since every possible data point in a population is not included, there will naturally be sampling error.
For example, assume there are a set of cohorts within a membership program. The success of a membership initiative is being measured based on a sample taken from each cohort. The measurement being gauged may appear higher for one cohort than the others. However, the sample drawn may have not sufficiently portrayed the population for that cohort.
Variation in the original population as well as the sample size contribute to sampling error. The effect of sampling error increases with smaller samples. Generally, with larger samples, statistical significance is less likely to be based on randomness.
With a more varied population, the confidence in the findings being statistically significant decreases. When the data is more widely dispersed from its mean as shown in the red distribution in Figure 2, there is more variation and therefore higher sampling error. Based on the red distribution, the amount of research requested by membership differs more.
With the narrower distribution shown in black, it can be assumed that most members have around the same number of research requests. The confidence in the findings is greater since the data is not as scattered. Also, in this case, the sample most likely better resembles the underlying population.
How is statistical significance determined?
Determining statistical significance involves establishing a null hypothesis. A null hypothesis is a statement initially assumed to be true. Regarding the red and black distributions above, a null hypothesis may be that “There isn’t a difference in the average number of research requests for the two populations.” You are trying to determine if this null hypothesis is false.
An alternative hypothesis should also be established. This is a statement that you are trying to prove. Given the above distributions, an alternative hypothesis might be that “There is a difference in the average number of research requests for the two populations.”
Another component to assessing statistical significance is the significance level, a threshold for understanding if the null hypothesis should be rejected. The significance level commonly used is 0.05, although other values can be used. There isn’t a singular threshold value that always confirms statistical significance.
A probability known as the p-value is generated from an applicable statistical test and is compared against this threshold value. If the p-value is smaller than the threshold, the null hypothesis is rejected, indicating a significant result and that the result is less likely random. With a smaller p-value, there is greater evidence that the null hypothesis is false. If the p-value is larger than the threshold, the result is considered non-significant, and the null hypothesis is not rejected.
What other factors surround statistical significance?
Non-sampling error occurs as well when samples are used to generalize about a population. This includes the bias that will potentially exist with factors such as poorly worded survey questions, ill-suited sampling methods, or low response rates. While p-values produced from statistical tests help rationalize sampling error, quantifying non-sampling error poses more of a challenge. Minimizing non-sampling error involves structuring the analysis as such to validate the results. This may involve introducing an element into the design that will reduce the effect of the error.
Confidence intervals are tied to significance levels and are affected by variation and sample size. They convey how accurate a calculated statistic is likely to be. They are wider for a population that is more varied and narrower with bigger samples. As an example, a 95% confidence interval indicates that 95 intervals will include the true population value and 5 will not for every 100 calculated confidence intervals from the sample.
What are a few particulars on statistical significance?
It is possible to have statistically significant results that have a minimal effect where the results are not important. A small p-value does not necessarily imply importance. When a finding is statistically significant, it is unlikely due to plain luck. Statistical significance should not solely be used to interpret whether an impact is meaningful.