It’s Time to Re-evaluate Our Love Affair with Statistical Significance

This is not a breakup letter. It’s a collection of hard to swallow pills that will improve our relationship with statistical significance.

The marketing world is love-drunk over statistical significance.

Honestly, can you blame us?

What other test conclusion could give a more perfect reassurance of accurate outputs, if not that of a statistically significant report?

I think my heart even skips a beat when I hear the alliteration of those two words together. 

Statistical… significance.

“Stat sig,” for those of us who really want to peacock our hipster badges of data analytics slang.

SEO Testing Bundle

I too, am a statistical significance lover. 

But stat sig has broken my heart one too many times, and it’s time to talk about it.

Is there a dark side to this eternal romance? And is it holding us back from healthy love?

In this post, I’ll take a look at six flaws in our relationship with statistical significance and break them down as hard pills we must swallow to begin improving our relationship with statistical significance. 

Flaw #1: Most SEO tests can’t achieve statistical significance

Some marketing channels are more conducive to statistical significance than SEO. 

SEO (for various reasons) has more barriers to statistical significant test conclusions than our digital marketing siblings. 

The main reason statistical significance remains so elusive in SEO testing is because SEO’s test environment is uniquely interconnected with Google’s search environment where only one page variation can appear at a time. I cover more of these nuance factors in this article.

In-fact, if you’re not working with a website that uses programmatic landing page groups to attract large audiences, then you’re plain out of luck in your quest for stat sig.

This means that SEO professionals must adopt non-significant testing methods to fuel our testing programs and meet growth targets.

Flaw #2: Achieving it might consume a lot of time and resources

Although some channels and automation systems sometimes are making it easier than ever before to launch, measure, and report on A/B tests to statistical significant p-values, many experiments still face high-cost execution requirements.

This is especially true for SEO, where complexity of SEO split testing techniques require advanced tools and expertise, as well as the right web environment to pull off.

Of course, if you’re running a high-performance paid search campaign or email campaign, you may be equipped with automation software that allows you to extract P-values with relative ease, but not all A/B tests are this straightforward. Certainly not in SEO testing. 

Flaw #3: There are other ways to measure and validate successful experiments

While statistical significance is a powerful methodology that will always have its place in the world of marketing experiments, there are many alternative methods that we can rely on to guide our marketing strategies.

At a high level, statistical significance and its alternatives trace back to two main approaches in statistical modeling: Frequentist vs. Bayesian statistics. Kevin Indig covers these really nicely in The Bayesian Growth Mindset.

In the interest of brevity, the Bayesian approach seeks to solve statistical problems by applying probabilities to mathematical scenarios. Take the following illustration, for example:

In the absence of statistical significance (the Frequentist approach), Bayesian statistics allow us to operate on a spectrum of confidence intervals. 

The lower our confidence interval is for a given scenario, the less we should be willing to bet or invest in that outcome. The higher our confidence interval, the more we can bet and invest in the outcome. 

These investments allow us to make faster decisions, get comfortable with ambiguity, and scale experimentation efforts to higher velocities.

For further reading, I strongly recommend checking out this article by Robert Neal, where he talks about Bayesian intervals in terms of “expected utility.”  

Flaw #4: It might be blinding us to higher-impact testing frameworks

As we discussed in the previous section, statistical significance isn’t the only game in town.

But it is the most talked-about amongst my marketing colleagues. 

I think this overwhelming popularity has been a detriment to the marketing community at large, as it holds us hostage to the idea that if a test is not statistically significant, then it’s invalid. 

Simply put, most marketers seem to have built an ignorance (whether blind or willful) to expected utility testing.

As more companies are updating their statistical models to Bayesian techniques (Google Optimize uses Bayesian stats), we should be updating our approaches to marketing experiments and SEO testing frameworks. 

Flaw #5: Slows down decision making

We touched on the higher resource investments that occasionally go with statistical significant testing, but equally important is the time investment that statistical significant experiments require. 

Even when we’re using automation tools that cut down on implementation costs, statistically significant tests take a lot of time. 

No big deal if your team and projects aren’t on the hook, but if you’ve got projects and decisions that are waiting to move forward, statistical significance can slow marketing growth and start to rack up insane degrees of operational debt.

Flaw #6: It doesn’t always lead to adoption

This last pill might be one of the most painful pills to swallow. 

After all of our sweat, investments, and hopeful optimism in marketing experiments, leaders often make decisions that outweigh the hard-earned conclusive data.

No matter which statistical model you’re favoring for your marketing experiments, the risk of non-adoption is always present, and happens more frequently than our data-driven minds care to admit. 

Even though we’ll never be able to avoid the pain of non-adoption, Bayesian testing might help mitigate our frustrations whenever we can lower our time investments and build up a more scalable program of concurrent experiments as well as fall-back experiments.

Non-adoption doesn’t quite sting as much when we’ve got more tests in the pipeline.

We can still maintain a healthy relationship with statistical significance

After all we’ve been through together, it wouldn’t be right to cut statistical significance out of our lives. 

Sure it has its flaws, but statistical significance and Freqentist statistical models will always have a place in our hearts as well as our marketing strategies. 

If we can get comfortable expanding our relationship with Bayesian-like frameworks that allow for faster decision-making, we can maintain a healthy relationship with statistical significance while we build more effective and scalable marketing experimentation programs.

Further reading