How to do an SEO Internal Link Audit and Topic Clustering using ML

internal linking
internal linking

If you prefer to watch this blog post in video format, check out my presentation on YouTube:

Contents hide
How to do an SEO internal linking audit

Before we dive in, I wanted to present a quick Q&A for those new to the topic:

What is internal linking?

An internal link connects one page of a website to a different page on the same website, otherwise – it’s a link from the same the source domain and target domain.

Why is adding internal links important?

Internal links are useful for link building of domains as they assist crawlers to navigate the website and its architecture and they allow page authority to be distributed from one page to others throughout the site and new pages to be discovered. This improves the website’s funnel in these three areas: Search optimization, User experience, and Conversion optimization. Interlinking enables the formation of content clusters, which indicate the content’s context and hierarchy, saving Google some effort.

What text should you use when linking to pages with one another?

The text you link from is recorded in code, while search engines also take into account the text nearby. The linking text is typically referred to as ‘anchor text’.
The best way to interlink is to use a link of high value to the user, which is informative and matches the context of the content they are reading. That is a surefire way to get them to click. Internal links indicate to search engines that the link you give is of such high relevance and importance that visitors were missing out by not having a direct link.

How to easily find relevant posts to link from your blog post?

The two most important things you need to take into account when interlinking your blog posts are: (1) their relevancy to one another, (2) the text that you are linking from. Here’s how to find relevant posts on a platform such as Medium, using the Google Search Console.
Type in the following query:
site:medium.com "your name" <the topic or term you want to find relevant articles for>e.g.site:medium.com "Lazarina Stoy" interlinking
This query allows you to search a specific site (i.e. medium.com) for a page containing a unique term (i.e. your name as you are an author), and a broad term (i.e. interlinking; though if you put this in parenthesis “like this” — it will search for an exact match).
The results will be based on popularity and relevancy to your query, so just what we need.

Which Google algorithms are relevant to internal links?

Google’s initial algorithm, PageRank, relied heavily on links for content discovery and ranking, assessing the quality of links between pages. However, this led to the rise of link spam, prompting Google to adopt more sophisticated ranking methods. In 2015, Google introduced RankBrain, a machine learning system building on the Hummingbird update, shifting focus from keywords to topics and user intent. RankBrain also emphasized the importance of optimizing websites around topics and streamlined internal linking strategies to improve website accessibility for search engines and crawlers. More recently, Google announced MUM, aiming to link and understand data across various formats (video, image, text) and surface relevant information in the most appropriate format, recognizing semantic relationships between topics, entities, and subtopics.

While links are still a crucial part of content discovery, in the present day – content is king. Without good content, context becomes meaningless. However, if the quality of the content on your website is here, context (or otherwise – links) becomes the vessel to bring it to life. Internal linking is a link strategy that is entirely in your control.

How to do an SEO internal linking audit 

Step 1: Define the aims of the audit

The purpose of the internal linking audit is to identify the site’s internal linking opportunities, as well as highlight the topic clusters and entity-relationship structure on the website between existing content.

This process, however, can also highlight mismatches between the existing content on the website and the business aims, goals, and direction.

With that said, there are four broad aims of internal linking strategies:

Establish Topic Authority

This is achieved via internally linking informational pages, based on similarity and topic relevance, using the key terms that frequently occur in the topic cluster.

Boost Money-making pages

By providing links from pages with informational intent to the most converting pages within each topic cluster, ideally using the most important terms for users for this particular page (i.e. top queries from Search Console), we can signal page importance within the topic clusters.

This can create a hierarchy, where ideally, the page that is emphasized is the best page, not only for users (i.e. most comprehensive review of the topic, i.e. the pillar page) but also for the business as well – or otherwise the page that makes the most out of user visits via the desired goal completion.

Enrich Search Intent

Search Intent Enrichment is the practice of enabling different perspectives of a topic, problem, or solution for a user while they are on your website, to reduce pogo-sticking and the strain on Google having to re-index all pages, based on a new query entered by the user.

For this process to be as effective as possible, you should aim to move visitors from low to high-intent pages, promoting pages with high conversion rates and comprehensive topic coverage. 

Build Inter-Cluster relationships

Inter-cluster relationships are achieved via providing links from and to pages that somehow align to more than one topic cluster. This can be easily identified via practices like entity recognition or topic modeling alignment, which will be presented later on in the guide.

Step 2: Understand and select the KPIs and potential (measurable) outcomes of the internal linking audit

Before doing the audit, you should understand the KPIs and input/output metrics to look out for performance-wise. There are several benefits of internal linking, let’s discuss them and go through why they are observed, alongside the metrics to monitor for performance improvements.

Internal linking enables you to establish a clearer and more comprehensive site structure

Internal linking plays a crucial role in establishing a clear and comprehensive site structure. When implemented coherently and strategically, it creates an architecture that is rooted in content relevancy, user intent, and the interrelationships among different pages. This approach not only enhances the user’s navigational experience but also strengthens the thematic consistency and contextual understanding of your site, both for users and search engines.

Internal linking allows for improved indexing of the site

When Google crawls a web page it identifies new content via the presence of internal links, which allow it to add to its index content that is seen for the first time, or that is not indexed yet. A site that is thoroughly linked based on entities and established semantic relationships can benefit from seeing its new content more quickly discovered by search engines.

Enhanced website navigation and user experience

In an article about a search intent-driven website architecture, I emphasized the importance of enriching search intent for an improved user experience. Internal links are the vessel that enables this to be achieved.

Via the use of internal links a website owner can enhance the search intent of visitors from informational, to navigational, via commercial and transactional – all without leaving the website. They can also enhance their intent by introducing them to other topics, or otherwise – expand their search horizons, if the user has a purely informational need.

Improved site engagement

By effectively interconnecting content, you encourage users to explore more pages per visit, increasing page views and deepening engagement with your site. Additionally, a well-executed internal linking plan can significantly reduce the bounce rate, as visitors are more likely to find relevant and interesting content, keeping them engaged for longer periods and reducing the likelihood of them leaving the site quickly.

Increased number of long-tail keywords

Adding internal links can improve search engine rankings for long-tail keywords. By strategically distributing page authority through internal links, especially using relevant anchor texts, your site gains an edge in contextual relevance for specific long-tail keywords. Additionally, a well-organized internal linking structure not only improves user engagement and session duration but also uncovers opportunities to optimize content for more specific, targeted keywords.

Improved CTR

The image below shows an annotated chart of CTA before and after the internal linking strategy was deployed on a website, with the author highlighting that the organic CTR rose in one week from 3.07 to 4.13, an increase of 34%.

image 18

According to the author, 7 weeks later the organic CTR was measured at 5.61, representing an increase of 81% (while the average positions reported in the Search Console account increased overall, due to the pages ranking for more long-tail keywords).

Organic traffic boost

image 19

In the same case study, the site experienced a boost in organic traffic as well, following the automated deployment of internal linking. Between the date of the links deployment and the measurement benchmark (7 weeks later), the traffic had multiplied by 2.5 (with no additional SEO-focused work being done in the meantime).

Increased Goal Completions to Targeted Pages

Quick note that increasing goal completions to targeted pages can be achieved via internal linking only when combined with intent-focused linking strategies and appropriate CRO initiatives.

Yet should the aim of your SEO work involve increasing conversions, some techniques and strategies will be highlighted as part of this article, that will enable you to target high-intent, high-converting (or as I like to call them – money-pages), which will (if all other benefits of the internal linking strategy are observed), also increase conversions on those pages, too.

Many CRO strategies can be implemented on pages that have a target of achieving conversions, and while that is not the focus of this article, the idea behind internal linking is increasing the visibility of money-making pages. In this sense, should the pages you are targetting are optimized for achieving your targetted goal completion, in theory, you can observe an uptick in those metrics, too.

However, that will not be a direct result of the internal linking activity, but an effect of having an overall optimized site for search intent fulfillment, and CRO, that better comprehension by crawlers have benefited.

Step 3: Construct your internal linking audit’s structure – here’s what to include

An analysis of the internal linking structure would involve looking at the following things.

 Area of analysisWhat to include? What could this impact?
Link Frequency# inlinks
# unique inlinks per page
Orphaned pages
Crawling
Page discovery
Link QualityLink placement
Source traffic
Added link sections
Engagement
Comprehension
Anchor textType Length Engagement
Comprehension
HierarchyTop pages 
Presence of topics, tags, or categories
Crawl depth
Engagement & Ux
Crawling
MistakesBroken links
Nofollow tags
UTM parameters
Crawling
Indexing
Traffic

Total number of internal links pointing to a page

Via the number of internal links, you can establish a hidden page hierarchy, emphasizing the importance of some pages over others. This can include looking at the number on average of internal links per page and unique internal links. It can also include the average inrank of the pages if you are using a service like Oncrawl or are willing to calculate it yourself.

Orphaned pages

Orphan pages are not linked to any other page on your website, thus they just sit idle and don’t receive any link juice. Google cannot see them and doesn’t know they actually exist.

This issue can become huge for bigger websites, so discovering differences between what you think is on Google, internal links, and actual Google-discovered URLs is vital.

When doing an internal link analysis, what can help is to highlight patterns in the way pages are linked to and from main top-level and sub-directories.

Quality of those internal links pointing to the page (and its effect)

Quality in the context of internal links can be assessed by things like the link placement on the page, the context of the link, and the technical setup of how the link is provided. On technical set-up, a mistake to look out for is that people often place intended internal links in buttons with the intention of these passing link equity, however, as confirmed by John Mueller, Google does not click on button links, so instead if there is visual importance of styling a link as a button, something to consider is to:

Use normal HTML links and just style it with CSS to make it look like a button rather than to use button elements in HTML and add JavaScript that kind of makes them act like a link

John Mueller, quoted in SEJ

When doing this analysis, also look into whether you have any “read more”, or “click here” buttons intended for internal linking in that same context. Assess the traffic passed on from links in different top-level and sub-directories from and to one another.

Consider the link sections you have added on the site as well:

  • related or popular posts recommended in the blog
  • category pages and tag pages
  • breadcrumbs navigation and appropriate schema added

Type of anchor text of internal links, pointing to a page

Semrush previously discussed the different types of anchor text that you can use for internal linking, including the presence of branded versus non-branded keywords, targetted keywords in the anchor text, exact matches, and so on. Having a variety of anchor text can help enhance both crawler and user comprehension of the link destination, enhancing the user experience.

Length of the anchor text of internal links

While mere speculation, there is evidence to say that longer anchor text can be of value for internal linking efforts. The reason for this is quite unsurprising as actually, longer text can provide more context of the link’s destination.

Link placement

The placement of the link on the page can also be a signal, for instance, it makes a huge difference whether a link is placed in a navigation menu (and which one – top navigation, footer, or sidebar), or whether it is in the content. Links placed higher up the page have been considered to attract more clicks, though there is no official research of this, but only probability-based calculations.

Top pages, or otherwise most linked pages site-wide

Top pages can be considered a factor for ranking, hence why it’s good to report on what those pages are for your website. This is what Google refers to as “a natrually flowing hierarchy

Presence of breadcrumbs navigation

the presence of breadcrumbs allows user navigation via internal links to the previous section or the root page. This alongside, the appropriate breadcrumb structured data markup is recommended by Google.

Crawl and link depth

All important content should be 1-4 clicks away from the homepage and while, of course, this can vary greatly for enterprise sites, there are clear benefits to having user- and business-important webpages close to the homepage. This can lead to improvements being made to the information architecture of the website, enabling better comprehension of the site’s content.

Broken internal links

broken pages on the website signal bad site hygiene and result in a negative user experience. What is worse is that if those broken pages are a destination of a large number of internal links, this can affect the pages that are linking to them negatively, too, as validated by a Web Page Decay patent, that reads:

“If the web page has a relatively large number of dead links, it is assessed as being a stale web page.”

Web Page Decay patent

Nofollow internal links

Putting a no-follow to internal links is generally bad practice. Using the Nofollow link attribute communicates that you don’t know if you trust a link, so using it on internal links gives conflicting signals to Google about the pages on your website.

While there are SEOs, claiming that there are specific cases, where you can and should use nofollow tags, such as login screens or comments on blog posts that you might want to also have indexed, this is not supported by the official Google stance on the matter.

Here is a video by Matt Cutts, explaining whether or not to use nofollow attribute on internal links – and the answer is a plain and simple no, don’t use nofollow on internal links. The reason is that you would like the PageRank to flow naturally throughout your website, and when the crawler encounters a nofollow that will cause these links to drop out of the link graph, hence they don’t flow PageRank anymore.

In the video, Matt also touches upon whether or not you should have a nofollow to the login page and while he recognizes that this might be a page that in some sites will have nofollow attribute to the link, it wouldn’t hurt the site if the nofollow is not there.

While most of the analysis presented above can be completed with a quick crawl and a clever Data Studio or Google Sheets template, I would like to encourage you to look into a few additional areas, which can help you see your website, content, and links, the way search engines see it, and as a result – better understand how to improve the internal link structure.

Where can you incorporate machine learning in this analysis? (and should you)

In this section, I would like to include several resources to help you conduct the audit part of the internal linking strategy a bit quicker.

How to classify Anchor Text with Machine Learning

Anchor text classification can be extremely powerful, as when involving more complex machine learning models, such as GPT-3, it can enable complex class labeling of anchor text types at scale in minutes.

You can use different types of classification here, but here are a few use-cases:

Using these resources, here are a few scenarios on what you can quickly identify, using the content of your website:

  • Identify the most frequently mentioned bi-grams and tri-graphs (or otherwise two or three-word phrases) in the anchor text or in the content
  • Categorize your anchor text, based on product or service category using GPT-3
  • Categorize your anchor text, based on whether it’s branded, non-branded or compound

Both resources mentioned are extremely beginner-friendly, and the authors (Greg and Danny) have done an amazing job to make these resources into a format that is easily digestible for beginners and very easy to implement as well.

Entity Identification using Machine Learning for Internal link

Entity identification is the process of identifying and labeling entities. Otherwise, this is the process of inspecting a given text for known entities (people, organizations, landmarks, countries, etc.) and returning information about those entities.

Here, I’d like to point your attention to several methods:

But the idea here is that you need to identify the different entities that are prominent across the content of your website and also see what type of entities are you most commonly referencing.

In order to ensure that the site is optimized for being appropriately comprehended, ensure that you have reviewed and identified the main entities of the website, using an entity recognition software or tool. It’s very good at this stage to also think about what are the relationships between these entities, not only identifying them.

Knowledge Graph Entity Relationship Visualisations using Machine Learning

To visualize the relationships between entities you can use different knowledge Graph Visualization Services, such as Neo4J, KgBase, or Google Knowledge Graph Search API. Or you can build a knowledge graph yourself using this handy tutorial for building a knowledge graph using BERT, spaCy, NLTK by Pavan Sanagapati.

The idea here is that you would like to see what the links are between these entities on your site. Consider how different entities are related to one another via exploring them through a Knowledge Graph Explorer.

After that, as pinned by Dave Davies, you should take this information and put it in the context of the users’ devices and locations, as well as the SERP for your target markets. After understanding the different factors that come into play for content comprehension, you will be in a better position to optimize content semantically.

More insights will lead to better recommendations.

Step 4: Identify the main topics of your website to create content clusters or group pages

We previously mentioned the importance of a comprehensive internal linking strategy. There are different ways of achieving this, however, a general method is via a hub-and-spoke approach, or essentially – a topic-oriented site architecture model.

In order to appropriately execute this model, the website (or website section, e.g. blog) needs to be crawled and organized into topics, with each topic cluster having three main components:

  • a pillar page – a comprehensive resource page that covers the core topic in-depth and links to the high-quality content, created for supporting subtopics. This page is what enables the users to find the related content more easily and streamlines the content discovery and comprehension process for web crawlers
  • cluster content – related content that covers subtopics within the content cluster, organized logically and coherently to tell a user story
  • hyperlinks – all content is connected via links, which enable distribution of PageRank between connected pages and within the website.

Topics enable signals to Google that each individual long-tail keyword within a topic cluster is semantically related to linked cluster content. These signals are facilitated via the hyperlinks, context of the links, and the frequency of links, collectively increasing the search visibility of connected pages.

Content clusters in machine learning are typically referred to as topic models, and they are a little bit different. Now, the fundamentals are absolutely the same. The old way of presenting a page was to have it as a standalone page and rely on backlinks to this page in order to signal its importance. And the new way is to put this page in the context of other existing content on your website.

I’ve written previously about topic modeling in machine learning, the methods, applications, and how topic modeling come about in my blog post: Topic Modelling: A Deep Dive Into LDA, Hybrid-LDA, And Non-LDA Approaches. To summarise the main points of this post that relate to the current discussion:

  • topic modeling was created with the intention of channeling computational tools to assist information discovery, pattern recognition, and to eliminate the reliance on query and link-based systems
  • LDA-based topic modeling works under the assumption of word-topic interchangeability, meaning that a word contained in the document is assigned a probability of being part of a given subtopic
  • The collection of words and their probabilities and presence throughout documents (e.g. pages of a website) helps define the size and content of subtopics in a corpus of data (e.g. a site)
  • LDA operates under the assumption that each document (page) has multiple subtopics

GXfy3JcQWUosKuIS5cmvLZF ir4 W6LQr551uEGTWoMvPPmaVABv9Ry9 tdT8MWzbXZY8Kq2XZ8a P

HKhbXYjnr4005Dji5Z9ah JT3xOX7LjxUrZJ6g7ZQYzwoYx9swE40 a2JVicmcso9SwMJRIsD7CN hokh y0wu 19K SaNh VgTbH CWX65WD7fvKmwDCvji oztEwD QITlqzMnEsCO

In the past 20 years, a lot has happened in the field of topic modeling. In 2003, LDA was introduced, which operates under the assumption that words and topics can be inter exchanged. Then LSA was introduced, which actually brought about the use of synonyms. And then in 2008 probability was introduced and how the computation was done, which allowed for scalability of how these models were implemented. Then hybrid models came about a decade later, and now since 2017, we are seeing the introduction of deep learning and topic modeling.

But the problem that we, professionals face remains the same. We’re not using these models, so why don’t we try an ML-based solution?

We can use one of these two methods and two tutorials I’m linking here to kickstart a topic modeling journey, one of them being an LDA tutorial and the other one being an LSA, both using Python for the scripting:

Or you can just skip the code and jump straight in.

image 1
Process of using a no-code ML LDA app for identifying opportunities for internal links on your site

I recorded a video to show you step by step tutorial on how to do topic modeling using a no-code, publicly available web-based application that uses LDA that was originally developed by Cornell.

In a nutshell, this video will show you how to crawl and export content on your website, how to upload your files to the web app, and fine-tune the model’s performance, and also download the files and explore them and build these files into your deliverable of the internal linking audit.

In the end, you’re going to have a couple of files, one of them showing the topic to topic similarity, how often the different topics identified on your website co-occur, or whether some of them should not be related at all don’t co-occur on the site. Another part of the output of this is that you’re going to have the belonging of each content on your site, each page based on a particular topic. So what is the probability of this page belonging to this topic? And you’re also going to get a Gemsim file that is going to enable you to quickly visualize using a very cool interactive chart, the topics and the most frequent terms of each of them.

Why should you do this?

  • You will get a baseline overview of the website, regardless of its size in less than 30 minutes.
  • You will save yourself a ton of time
  • You will test something new, risk-free, and if you are a newbie in the machine learning world – code-free, too!

Now let’s talk about what additional opportunities you can discover using machine learning. And to do that, we’re going to enter fuzzy matching into the conversation.

Step 5: Find similar content to implement internal links to using fuzzy matching

Fuzzy matching is a quick and dirty way of calculating the similarity between two strings.

As with the other topic modeling algorithm, I recorded a video to show you how to use fuzzy matching in your internal link audit but to also use it for other cool things because there are so many resources created by people in our industry about fuzzy matching that you can utilize very quickly. I’ve also written a blog post on the topic of Fuzzy Matching, sponsored by the lovely folk at Ahrefs.

And when you are thinking about additional internal link opportunities, don’t forget to cover your bases. We’re talking things like:

  • related article sections and most popular sections, depending on what your site allows
  • breadcrumbs navigation and appropriate breadcrumbs schema implemented
  • very well optimized in terms of information, intent, and category structure and types of content available, footer menu, and footer navigation
  • no more read more or click here buttons or anchor text (!) – provide context of the link destination, always

Step 6 – Find anchor text opportunities for internal links at scale with this Google Sheets Template

One of the most challenging things when doing an internal link audit is to find where to link.

For small sites, you can use the custom search feature of Screaming Frog

For larger sites and bigger internal link initiatives, you can use my Free Google Sheets template for Interlinking Opportunities.

inlink set up 1

Set up your sheet, using the instructions.

Screenshot 77 1

Identify inlink opportunities at scale.

And for really big sites and really big internal link initiatives, you can use something like BigQuery in order to do essentially the same thing.

The logic is the same in all three things – you are trying to find at scale quickly the pages where you can find this particular anchor text that you’re interested in linking a page with.

In this state, you have to search for intent-based keyword variations when you’re searching for your keywords and finding opportunities where to link you have to search for different anchor types. So for instance, branded, non-branded, mixed, et cetera.

You have to incorporate entity-based linking and you have to also think about how you can boost money-making pages in the process.

So essentially, I’m just bringing your attention back to all of this analysis that we presented in the beginning that you already have available at your disposal. And also think about how you can boost informational intent pages, high intent pages, and pillar pages in order to create a structure that supports high conversions on your site.

image 3
How to incorporate intent into the internal linking structure of topic model-based linking – a sample model

Step 7 – Organizing your internal link audit deliverable

What sections to include in your internal link audit

When presenting your internal link audit, you should include:

  • your analysis – present in a format that allows for benchmarking, and that is typically a Data Studio template or a Google Sheets template that will allow for a comparison of before and after.
  • the recommended topic models and structure, alongside internal linking recommendations and anchor text recommendations
  • additional internal link opportunities – highlight the different strategies that your client can use.
  • prioritization and budget for this initiative – Think about how our thing is going to be implemented, who’s going to be helping you with that? What the timeline?

How to prioritize internal linking initiatives when resources are limited

You can prioritize based on subtopic popularity or business importance.

You can also prioritize, based on subtopic popularity and search volume.

You can prioritize, based on the budget the client has for implementation or based on quick wins, combining the aforementioned approaches.

How to measure the success of internal linking activities

There are a few ways that the success of internal linking can be measured.

Set up Google Data Studio Reports

Set up a custom Google Data Studio report to track 360 performance for pillar pages and high intent pages in each topic cluster.

In Google Data Studio, you can also set-up segments (via custom dimensions) for each topic cluster, observing organic search position changes for the pages that are included in the cluster, as well as for the terms (via queries position tracking), used for anchor text for these pages. This can help you identify whether the additional links, using a particular keyword as the anchor text on the page, has enabled the site to gain better positions for this query in search results.

This type of report can also be paired with a benchmark on performance for the internal linking metrics we discussed as part of the internal linking audit site-wide.

Create segments in Google Analytics

For each topic cluster, create a GA segment and observe user behavior flow, specifically measuring improvements in the flow between informational search intent pages to high-intent pages.

Monitor user engagement

User engagement is a key tell-sign of intent alignment and user satisfaction with the site – not only with the content matter but also with the navigation and other user experience settings you have enabled. 

Observe the engagement metrics site-wide, monitor things like the number of page views per session, the page scroll rate, bounce rate, average session duration, etc. 

Track Benchmarking and Growth Metrics

Talking about tracking, here’s what you should measure. Your reporting should include things like benchmarking or all of the different things that we talked about assessing in the beginning, and it should also include things like growth metrics or the expected outcomes.

You should have a look at those two in order to see what the initiative is actually doing, what you wanted to do, and for growth reporting, you should track three different pillars or otherwise traffic metrics, engagement metrics, and the number of queries.

As Will Critchlow explains from SearchPilot, internal link effect measurement is really hard, and our intuition about internal link structures is actually pretty poor.

In other words, failing is part of the process. So for that, it’s very important for you to know that you need to test different internal linking strategies. You need to do experiments frequently. You need to report and measure their effectiveness.

Document your learnings and see what actually works for your site, for your client, for the industry and niche that you are operating in. And those three things are the way to go in terms of ensuring that internal linking strategies are actually beneficial for your site.

~~~ This post is a follow-up on my talk at Brighton SEO in the Spring of 2022, on How To Incorporate Machine Learning in Your Internal Link Audit. If you want to check the slides, please do so over at Slideshare.