The In-Depth Guide to SEO A/B Testing

You may already be slightly acquainted with SEO testing. You might have even run an SEO experiment or two of your own, and now you want to move on from time-based SEO tests to the more advanced world of A/B experiments. 

But, did you know that there are two different approaches you can take with  SEO A/B testing?

This SEO A/B testing guide includes step-by-step instructions for both techniques, along with tools and resources to help you get started.

If you’d like to skip to a specific section, this table of contents should help you find the information more easily:

SEO Testing Bundle

Introduction to SEO A/B testing

SEO A/B testing (also referred to as ‘SEO split testing’) is the process of splitting a group of templatized pages into a control group A and a variant group B, altering the variant group to match a hypothesis, and then measuring the performance differences between the A group and the B group to determine a winner.

The SEO A/B Testing Process in a Nutshell

Now that we’ve got those myths busted, let’s dive into the basic process.

Whilst conversion-oriented A/B testing methods are designed to split the traffic on one URL into an A user group and a B user group (see below)…

A/B split testing in the SEO world is done by splitting one group of templatized pages into an A page group (control) and a B page group (variant). As these two page groups are split, each one should be accruing relatively equal levels of traffic (a 50/50 traffic split). 

The process in a nutshell should look like this:

  1. Determine a sound hypothesis for the experiment.
  2. Aggregate data from the templatized page group and split the pages into an A group and a B group with roughly 50% organic traffic to each group.
  3. Deploy your SEO test hypothesis across the B group of variant pages.
  4. Measure the organic traffic changes to both groups until one group has reached statistical significance, or until the experiment shows a neutral / indetectable result.

Key Distinctions

🚨 Key Distinction #1: SEO A/B tests are not the same as CRO A/B tests 

Notice how I specifically indicated that SEO A/B tests require a group of templatized pages. This is the most critical distinction I could possibly burn into your mind because far too many SEO professionals confuse the SEO A/B split testing process with its more traditional CRO counterpart. 

You cannot, I repeat, you cannot apply SEO A/B split tests on a single URL. 

Don’t confuse the SEO A/B split testing process with the traditional CRO process. Even though the two processes have some commonalities, SEO A/B testing is very different in that the A and B groups are not splitting users on one page, but rather SEO split tests are splitting a whole group of templatized pages into an A group of pages and a B group of pages.

If you need to stop and get more clarification around this, I’ve written an entire guide about the difference between CRO testing and SEO testing.

This article assumes that the reader has a basic understanding of A/B testing for CRO.

🚨 Key Distinction #2: The page group on your website must have a strong level of monthly visitor traffic

SEO split testing cannot be done on just any website. 

As we’ve just learned your website must have a page group with near-identical, templatized, or programmatic pages. So SEO A/B testing would not be ideal on a traditional B2B blog, for example, unless that blog somehow incorporates a sizeable templatized page group.

Second, your website needs to be generating a high number of visitors to the page group that you’ll be testing on. Ideally, this number should be 500,000 monthly organic visitors or more.

If traffic to the page group is low, the measurability of your results becomes less reliable. 

🚨 Key Distinction #3: A/B split testing is also different from time-based SEO testing

A quick Google search for SEO A/B testing will pull up all kinds of misinformed results. Most of these articles are either talking about CRO A/B testing, or they’re providing instructions for time-based SEO testing. Neither of these are actual SEO A/B split tests.

Time-based SEO tests are outside the scope of this article, but the main difference in time-based SEO testing processes is that they measure before and after data, rather than splitting URLs into A and B groups.

If you find information online that instructs users to measure tests with before and after data, it’s safe to say that the article is misinformed about the correct SEO A/B split testing process.

There are two measurement techniques for SEO A/B split testing to be aware of

Measurement technique #1: Average Difference Change

The first time A/B testing for SEO really started to emerge dates back to a widely popularized 2015 study from Pinterest’s growth engineering team. In this study, the team designed a methodology to calculate average difference changes between a control group of programmatic landing pages, and a variant group of programmatic pages (measured by organic search traffic).

Pinterest’s SEO experiment took the SEO community by storm. 

Before the Pinterest study game about, SEO’s hadn’t fully cracked the code on statistically-significant A/B testing methods. 

To understand why A/B experiments would have been so puzzling for SEO teams, just take a look at a separate piece I wrote on why CRO testing is incompatible for SEO.

Measurement technique #2: CausalImpact Inference

In 2016, not long after Pinterest’s experiment was made public, Etsy’s team iterated on the design with an even more airtight methodology.

The team at Etsy took the idea of segmenting traffic by URL, and they added two statistical techniques that would make A/B SEO experiments even more reliable than the first SEO testing methodology. 

The two techniques? 

Stratified random sampling, and causal impact inference

Now if you’ve made it this far, I’ll spare you the intricate statistical workings of CausalImpact and focus mainly on simplifying the process for you to run this type of experiment for yourself. 

(The history and timelines here are to the best of my knowledge. If anyone out there can poke holes in my history, please reach out to keep my knowledge in check and avoid inadvertent misinformation). 

Knowing when to use each of these techniques

The main difference between the first SEO A/B testing method (besides process) is the degree to which each method achieves statistical significance. Method #1 is faster and easier to execute, but results in a slightly lower degree of statistical significance than method #2, which achieves higher accuracy at the cost of more time and resources.

I like to visualize these on a diagram that looks like this:

So, if you’re optimizing and you’re operating on few resources, go for method #1 (average difference change).

Otherwise, if your main objective is accuracy and you’ve got the capabilities to execute on method #2 (Causal Impact inference), then fire away!

How Average Difference Change is calculated (method #1)

For most A/B SEO experiments, our goal in segmenting the A group and the B group is to split traffic as evenly as possible between each group, method #1 assumes that there will still be variances between the two groups. It accounts for the variances by calculating the average difference in daily clicks throughout the entire duration of the experiment’s before period. 

The calculation looks like this:

Variant group average clicks / day – Control group average clicks / day = Average daily difference

Plotted out in visual form, it looks like this:

How CausalImpact is calculated (method #2)

CausalImpact is an open source statistical modeling program that was introduced by Google to measure the impact of causal events in a time-series model where the data after an event (test launch) does not have a clear and measurable control group of data. 

In more simple terminology, time-based experiments have really bad control data because we can’t measure in alternate realities, and we don’t have time machines, so CausalImpact builds an alternate reality for us using crazy-cool Bayesian statistical models.

The alternate reality is what statisticians call a ‘counterfactual estimate’ or a ‘synthetic control.’

Yes, there is far more to CausalImpact, but fortunately for SEO professionals, the gracious team at Distilled released a free DIY split testing tool that lets us utilize this model without needing to get a doctoral degree in data science. 

The step-by-step guides for each SEO A/B testing method

Steps for Method #1: Average Difference Change with SEOTesting.com

For the record, no I am not affiliated with SEOTesting, nor do I receive any endorsements for mentioning the tool.

The quickest and easiest way to start A/B testing for SEO is through SEOTesting.com’s split testing feature. 

Yes, there are other SEO testing tools and processes that you could explore, but as someone who’s searched high and low through the various options, I’ve got to come back to SEOTesting because of the low cost and ease of accessibility that the program offers users. 

Your steps for method #1 will be:

  1. Establish a clear, and measurable hypothesis.
  2. Split your A and B pages into equally-yoked page groups.
  3. Change only the pages in your variant group (B pages) to reflect your hypothesis.
  4. Create a new split test in SEOTesting.com (assuming you’ve already set up an account). Use this video tutorial if you need help.
  5. At the end of the designated testing period, measure the average difference change.
  6. Ensure that the winning result gets implemented on all pages.
  7. Rinse and repeat (or build a new hypothesis to experiment with). 

Example graph showing the average difference change from SEOTesting.com.

Steps for Method #2: Causal Impact Inference with Distilled’s DIY Split Tester

Method #2 will inevitably be more challenging to learn and execute, but as I mentioned earlier, this method is the most accurate available if you’re running the experiments on your own and need a stronger measurement technique.

Steps 1, 2, and 3 – Establish a hypothesis, split your page groups by traffic, and change the variant pages to reflect your hypothesis.

The first three steps for this process are the same as steps 1-3 in method #1. 

1. Establish a clear hypothesis that can be directly measured by organic traffic growth. 
2. Split your A & B page groups into equally-yoked sets of URLs based on traffic volumes instead of URL count.
3. Change all variant pages to reflect your experiment’s hypothesis.

Step 4 – Make a copy of the Google Sheets template.

When it comes time to analyze the results of your experiment, you’ll not only need to use the DIY Split Tester tool from Distilled, but you’ll also need to rely on Google Sheets to build the cumulative analysis graphs that help you detect exactly if and when your data reaches a 95% statistical significance threshold.

To make things easier, here’s a link to the Google Sheets template that I use, which you can copy for your own purposes.

Step 5 – Collect the total organic entrances (by day) for groups A & B.

After you’ve set the changes live to your variant group, you may want to let the experiment run for a few weeks prior to this step. 

As soon as it’s time to measure, you’ll need to collect the daily organic entrances for each day the experiment is live + 100 days before the start date. And, you’ll need to gather this data for each group of pages (the control group, as well as the variant group).

So, if your experiment has been live for 4 weeks, you’ll need to run an export for 128 days (4 weeks + 100 days).

Importantly, you’ll also need to verify that the data you’re exporting is unsampled data

Once you have that data, you’ll need to add it to the first tab of the Gsheet, which will look like this:

Step 6 – Run your data through Distilled’s DIY Split Tester.

Now that you’ve gathered 100 days of unsampled data for both your control group, and your variant group, you’ll head over to Distilled’s DIY Split tester and input your data so that you can retrieve an output of the causal impact forecast that we discussed earlier on in this post. 

Seriously, the fact that we can get this output for free is just incredible. Shoutout to the Distilled / Search Pilot team who’ve made it possible.

Before you export the forecast, make sure that you’ve also got the start date accurately filled out. 

Step 7 – Plot the split tester’s output in Google Sheets.

The DIY split tester will produce a CSV output for the causal impact forecast (oddly enough, this output is formatted without the file extension, so in order to open it you will need to add a “.csv” to the end of the file). 

Once you’ve opened the CSV, you’ll need to copy and paste the raw output data into the second tab of the Gsheet (Columns B – E).

This is where you’ll finally get to visualize your split test’s results, and measure for statistical significance.

Step 8 – Measure for statistical significance.

The output tab of this Google Sheet includes 3 visual graphs. The main graph that you’ll want to pay attention to for statistical significance is the Cumulative Difference Graph, which looks like this:

Here’s how to read the graph without having to dive deep into the inner-workings of CausalImpact and the advanced Bayesian forecasting models. 

The blue line (middle) represents the cumulative difference between your variant (“A group”) data and your control (“B group”) data. 

On either side of the blue line, you’ll see the cumulative upper bound (red line) and the cumulative lower bound (yellow line). Each of these bounds represent the band of error that exists in the statistical model. In other words, the red line shows how our experiment would look in a best-case scenario, and the yellow line shows how our experiment would look in a worst-case scenario. 

The entire band of error represents a 95% accuracy level.

Now that we understand each of these, there are just two main points to look at to know if you’ve reached statistical significance. 

First, there’s your experiment’s start date, which is the point in time where all 3 lines begin to diverge.

The next, most important point, is the point at which all three lines cross over the X axis. 

Critical to note that if all 3 lines do not cross over the X axis, then your experiment was not statistically significant, and your results are neutral. 

And, if all three lines cross over in a downtrend with the red line crossing over the X axis, then your experiment was a negative one.

Step 9 – Ensure that the winning result gets implemented on all pages.

Often forgotten, but highly critical; make sure that whichever group produced the winning outcome is rolled out across all pages. 

Step 10 – Rinse and repeat. ABT!

And, of course, ABT! (Always Be Testing) 

Don’t forget to look for a new hypothesis and continue testing for greater insights and performance improvements.

Tools to assist your SEO A/B testing projects

Distilled’s DIY Split Tester – The DIY split tester from Distilled is more difficult to use. Recommended for teams with advanced SEO testing capabilities.

SEOTesting.com – SEOTesting.com is best for teams with lower budgets and comfortable measuring their SEO split tests without statistical significance.

SearchPilot – SearchPilot’s server-side testing technology enables enterprise teams to run SEO split tests with very high accuracy. Best for enterprise teams.

SplitSignal – Similar to SearchPilot, but with technological variances, SplitSignal helps teams run SEO split tests to true statistical significance.

What about Google Optimize? – Contrary to popular belief, CRO tools like Google Optimize are NOT designed for SEO testing.

Resources to assist your SEO A/B testing journey

The following resources are my go-tos for SEO testing education and further reading/watching:

SEO Testing Bundle

SearchPilot’s SEO testing case studies – These SEO testing case studies are some of the most fascinating examples of SEO split tests in action.

SEO A/B Testing With Google Tag Manager – Evan Hall wrote a technical and informative guide to SEO A/B testing using Google Tag Manager.

A Year of SEO A/B Testing – This dinner-table discussion from Dominic Woodman is equal-parts entertaining and educational.

How to Run an SEO A/B Test [Template Included] – SMA Marketing created an extremely helpful video to show people how to properly run an SEO A/B test with Distilled’s DIY Split Tester.

Why You Can’t Use CRO Tools to Run A/B Split Tests for SEO – Detailed breakdown on SEO testing vs. CRO testing and why you can’t use CRO tools for SEO tests.

How to Grow Traffic with SEO Title Testing [In-Depth] – This in-depth guide provides the exact steps for running a time-based SEO test.

Conclusion

I hope you’ve enjoyed this piece. SEO split testing can be a confusing and difficult process, so if you’re running into any roadblocks, or if you’d like to ask more questions about this material, please don’t hesitate to reach out and I will be happy to help you along in your SEO testing journey.