This article dives into semantic keyword clustering and how to automate it this process. The original version of this article is published at KeyWI.io.
11 min read
Semantic keyword clustering is an indispensable part of keyword research but often a time consuming and complicated task. In a study conducted by search marketer Paul Shapiro, an average keyword analysis takes 5 to 28 hours.
Automating this process surely comes in handy! Fortunately, developments in artificial intelligence are accompanied by the rise of tools and techniques that help you automate keyword clustering.
But how to automate without racking one’s brain over complicated code, like in Python?
Equally relevant, why do we actually cluster keywords? How to ensure we group keywords properly? And how can KeyWI help you towards formulating data-driven content actions in just a few minutes?
In this article, I give the answers to these questions and explain why the quality of automatic semantic clustering is in most cases better than manual clustering.
- From Hummingbird to MUM
- Why is semantic keyword clustering relevant?
- What does semantic keyword clustering mean?
- Semantic keyword clustering
- Why KeyWI?
1. From Hummingbird to MUM
In 2013, Google announced Hummingbird; the code name for a new algorithm update. It includes several components, one of which is RankBrain, which was announced in 2015.
Hummingbird is able to understand the semantics of a user’s search query. It considers the whole search query – a word or whole sentence – rather than individual words.
In 2015, Google announced RankBrain, a machine learning technology and an extension of Hummingbird, which helps interpret search queries even better. Rankbrain is able to see patterns between seemingly unrelated search queries and learns how they are similar.
So RankBrain works with Hummingbird to provide better search results for user searches. Only Rankbrain goes beyond just semantic search. Based on what it learns, the self-learning algorithm is able to apply this ‘learning experience’ to future search queries. These can be similar search queries but also unfamiliar combinations of searches.
In 2019, Google introduced a new algorithm update called BERT. This model uses natural language processing (NLP) and sentiment analysis, among other things, to understand each word in relation to all other words in a sentence. But recently Google announced that they are developing a new technology that is 1000x more powerful than BERT: MUM.
“MUM is a technique that enables transferring knowledge across languages. MUM not only understands language, but also generates it. It’s trained across 75 different languages and many different tasks at once, allowing it to develop a more comprehensive understanding of information and world knowledge than previous models.” (Nayak, Google, 2021)
MUM is also multimodal. This means that MUM can understand information of different formats, such as web pages, photos, videos and more, simultaneously.
“.. MUM is multimodal, so it understands information across text and images and, in the future, can expand to more modalities like video and audio.” (Nayak, Google, 2021)
According to Google, search engines are not yet able to directly solve very complex search queries. Often, the models do not ‘understand’ the context or the latent needs. It requires users to perform multiple searches before they find their answer.
With MUM, Google comes closer to providing instant answers to complex questions. Think for example of the “next query you’re going to type in”. Google would be able to display or suggest the answer to your third question already during your first search. Latent needs become faster and more apparent.
It’s thus essential for Google’s models to understand someone’s intent and what intent and informational needs are hidden behind particular search queries.
2. Why is semantic keyword clustering relevant?
Two (or more) seemingly unrelated search queries can address the same informational needs and intent of the user.
Let’s take the following example to illustrate.
Search query 1 & 2
“Arabica coffee beans”
“Robusta coffee beans”
At first glance, it seems like the keywords have the following intents and informational needs:
- Informational intent: information on what Arabica / Robusta coffee beans are (and where you can possibly buy them)
- Commercial / transactional intent: chances of Google Ads and Shopping campaigns being displayed
If you would manually group these keywords purely based on syntax or the underlying meaning of the search terms, an easily available cluster name could be “types of coffee beans”. There is a semantic relationship, but is this information sufficient to formulate actions or to make logical inferences as an SEO marketer?
When performing a search in Google’s search engine, it’s clear that 2 out of the top 3 results in Google are dominated by blog articles addressing the (quality) differences between Arabica and Robusta coffee beans:
Similar search results pop up with ‘Arabica coffee beans’.
The semantic link ‘type of coffee beans’ is obvious at first glance, but without further analysis one might have overlooked the fact that the user intent and informational need are virtually the same for both search queries.
This is the power of RankBrain.
Ads indicate commercial and transactional intent are present too. Except for one URL, the organic search results do not include any URL that respond to these intents with, for example, an e-commerce landing page featuring an overview of Robusta (or Arabica) coffee beans. Apparently, the informational intent and need is dominating in these search queries.
The above insights could only have been collected by analysing the SERP separately for each keyword.
There is no getting into manually carrying out an analysis for a keyword research of 5,000 or more search terms.
As an SEO consultant, your task is to formulate the right actions based on insights derived from the keyword analysis. Prior to content creation you first want to provide well-founded answers to the following questions:
- What type of page (and functionalities) best ‘fit’ the search query?
- What search intents does the user have
- What explicit and underlying latent questions and needs does the searcher have?
- What content formats are best suited to answering the questions?
And ideally also
- Who is searching? (Who is your target audience?)
Before you can formulate the right actions, you first need to know which keywords are semantically related and which keywords meet the same intent and informational needs.
3. What does semantic keyword clustering mean?
Keywords clustering means grouping keywords that are semantically related and address the same search intent.
How does this work?
In 2017, I wrote an article about ‘whisky for beginners’ for a Dutch retailer Gall & Gall. At the time, I was a novice whisky drinker and went to consult myself to gain insight into what the contents for the article would be. I asked myself the previously mentioned questions:
What type of page (and functionalities) best ‘fit’ the search query?
- A blog article
What search intentions does the user have?
- Informative and perhaps commercial. The searcher mainly wants to know which whisky she or he can try as a novice whisky drinker.
What explicit and underlying latent questions and needs does the user have?
- How do I know if I like whisky?
- What is good whisky?
- Which whisky is ‘smooth’?
- I don’t understand whisky jargon so I need to be able to understand what I am reading
- I don’t want to spend too much money
- Where do I start?
What content formats are best suited to answering the questions?
- Text, images and products
Below are the top 18 current rankings of the article without sitelink rankings, 4 years later.
What is striking about this list and the content of the article?
First of all, the primary keyword ‘whisky for beginners’ is mentioned 0 times in the article. Is this a bad thing? Not necessarily. I forgot to put keywords in the text  at the time. Instead, I focused on the questions above.
The article furthermore addresses the following topics:
- whisky for beginners / best whisky for beginners
- learning to drink whisky
- whisky flavours
- smooth whisky
- sweet whisky
- tasty whisky
- combination of smooth / tasty / sweet whisky
What makes the article rank in the top 3 for all these search terms?
It’s largely  because the semantic relationship and search intent are very similar for all search queries. In addition, the overarching theme is ‘whisky for beginners’. Beginners have specific questions or needs. For instance, some questions will never be asked by an experienced whisky drinker.
A beginner whisky drinker searches for sweet or smooth whiskies because those are the entry level whiskies. The predominantly easy to drink whiskies. Learning to drink whisky is something you only do as a beginner. And the underlying informational need is knowing how to start and with what whiskies to start. ‘Whisky flavours’ or ‘tasty whisky’’ are typical search queries used by someone who has little knowledge of whisky.
You could say that Google Search, with the help of RankBrain’s self-learning algorithm and other models, was sophisticated enough back in 2017 to determine the semantic relationship between these search queries. Hence the article ranks on keywords that do not necessarily have to appear in the article as the underlying user intent and informational needs are met.
Correct clustering of keywords thus creates a strong start for your content strategy. It yields better insight into how you can organise pages and rank for particular clusters of keywords.
Other benefits of semantic keyword clustering are:
- Stronger rankings for long-tail keywords
- Better understanding of the underlying relationship between clusters
- Improved rankings for short-tail keywords
- More opportunities for internal linking
- Building up expertise and authority in a niche
4. Semantic keyword clustering
Keyword clustering starts with keyword research. Collect as many relevant keywords as possible, including all variations, long-tail keywords and subtopics.
When performing keyword research, make sure ‘relevance’ and ‘search intent’ are top of mind. Marketing budgets are spent wisely on content that brings relevant visitors to the site. It sure can be a tough job though to determine what exactly makes a search query relevant. I mean, ‘relevance’ is a commonly used buzzword.
The following factors will help to narrow down the scope and improve accuracy of keyword research:
- The type of site. Is it a platform for inspirational content? Or a news site? Or an e-commerce platform? Or a combination?
- Brand or non-branded site. For example, do you sell one brand with the same product or different products? Do you host a comparison site offering different brands?
- B2B or B2C. Are you active in B2B or B2C, or perhaps both?
Depending on these factors, your site may or may not be able in the first place to rank for certain search queries.
The type of page and content formats also play a part in understanding and addressing user intent. Some search queries can only be served by particular types of pages and content modalities, depending on how the user wants to consume content.
An example to illustrate.
Suppose you are hired as SEO marketer by the imaginary bicycle brand “FastBikey”. The bicycle brand hosts a platform with integrated webshop featuring branded products. In the keyword research you came across the keyword ‘buy a bike’.
Is this keyword relevant?
It’s clear that established parties dominate the top search results:
- They sell different brands
- They rank with landing pages that offer an overview of a wide range of bicycles and brands
- They offer products as well as provide information that is helpful in the purchasing process
Users searching for “buy a bike” are considering buying a bicycle but don’t have a particular brand preference yet. To fit that need, it would make sense to see an overview featuring images of different bicycles with copy content that meets the explicit and latent informational needs of users.
FastBikey as an individual bicycle brand cannot meet these informational needs or address user intent. And building a relevant page is simply beyond FastBikey’s scope.
Understanding which keywords are relevant is essential to creating the right semantic content clusters and driving relevant traffic to your site.
Keyword clustering with KeyWI
KeyWI is an ai driven keyword clustering tool that semantically analyses and groups a set of keywords in just a few minutes.
From a structured keyword list to a complete keyword clustering in just 2 easy steps. I explain the steps by using examples including immediate insights and concrete actions.
Step 1 – Uploading keyword list
The test set is a dutch keyword list of 469 keywords on the topic ‘artificial grass’.
First upload the csv of the keyword set.
Then configure the geolocation settings to your preference. I chose the following setup:
You can specify a domain at ‘Domain rank’. For example, you can choose a client’s domain name . KeyWI collects ranking data of all keywords in the set for the specified domain. This allows you to analyse the site’s visibility per cluster or even subcluster.
Ready? Press ‘Cluster’.
Step 2 – Analysing clusters & user intent
The clustering of 469 keywords takes about 2 minutes.
After 2 minutes, KeyWI generates the following visualisation.
The first layer contains the primary content clusters. The second layer of clusters contain the subclusters. KeyWI has assigned each content cluster (=circle) a dominant user intent. For example, a blue cluster is predominantly informative.
Of course, a (sub)cluster or individual keyword does not have to correspond with 1 Intent. In the example of the keyword ‘buying a bicycle’, the keyword has a mix of informational, commercial and transactional intent.
Select one of the clusters.
In the example below  you see a predominantly informative content cluster with commercial as a secondary intent. The topic is ‘laying (or installing) artificial grass’ .
What can be observed:
- It is an informative dominant cluster with informative sub-clusters
- Virtually all search terms have a commercial intent except for keywords that do not contain the word ‘garden’ (=tuin) or search terms that are questions.
2 types of sub-topics appear;
- ‘artificial grass installation’ (a business does it for you)
- ‘laying artificial grass myself’
If you click on the second layer on the second bubble from the right, you go one layer deeper. Among the bubbles is the subcluster “laying artificial grass – generic”.
The keywords are variations of saying ‘laying artificial grass’. Virtually all have both informative and commercial intent. After all, it’s impossible to derive from the keywords whether the search queries are performed by people who search for a service that can install artificial grass for them or who prefer to install artificial grass themselves. Google search results confirm this.
The ads shown meet the commercial intent and the need of individuals or companies looking for a service company that installs artificial grass.
Also, the top three  organic search results are dominated by blog articles that meet the informational intent and provide searchers with tips on how to install artificial grass themselves :
Another subcluster in the same primary cluster ‘laying artificial grass’ is ‘laying artificial grass garden‘.
This example is specifically about installing artificial grass in the garden. This ought to give a better indication that it concerns individuals who want to install artificial grass in the garden themselves.
However, Google search results return more or less the same results as the previous subcluster. Namely, blog articles about installing artificial grass yourself.
What insights and actions can you derive from this?
- ‘Laying artificial grass’ is the dominant topic
- Informational intent focusses on installing artificial grass yourself, commercial intent on having artificial grass installed for you as an individual or business
- Within the topic ‘laying artificial grass yourself’, KeyWI generates subclusters with long-tail keywords
- Long-tail keyword subclusters aid you to gain insight into the specific informational needs within the topic of ‘laying artificial grass yourself’
Actions for organic content 
- Page type: a blog article
- Content type: an how-to article or guide that explains how to lay artificial grass
- Keywords: the keywords of the informative subclusters that are part of ‘laying artificial grass’
In a similar vein, you can easily analyse other content clusters.
Extra – search visibility analysis
Would you like to know the specified domain’s performance in Google? KeyWI also visualises per sub(cluster) the average ranking.
Is a cluster grey? Then the entire (sub)cluster does not rank.
The above visualisation provides you with direct insight into possible opportunities for content creation and optimisation.
But also high-performing pages or search terms are worth evaluating.
The following image shows an example of the green subcluster ‘buying online’ of the parent subcluster ‘buying’ of the commercially dominant cluster ‘artificial grass generic / buying’.
Something odd happened: to the first three terms, which contain ‘kunstgras online’ , KeyWI has attributed only navigational intent. You would expect commercial or transactional intent as the three search terms are about buying or ordering artificial grass online.
Another subcluster, containing only the related generic keywords ‘buying artificial grass’ and ‘ordering artificial grass’ has an average ranking of 1. The keywords have both commercial and transactional intent.
Both subclusters’ keywords target the same page.
In summary, the following can be observed:
- Search terms with ‘artificial grass online’ have navigational intent
- Specific search terms containing ‘online’ underperform compared to the generic search terms ‘buying artificial grass’ and ‘ordering artificial grass’
- Commercial and transactional intent are attributed to generic terms that do not include ‘online’.
An examination of the search results in Google shows a case of Exact-Match-Domain (EMD) . In these cases, the match of the domain name can induce users to click on that search result. After all, users are specifically looking for artificial grass they can order online.
But is that the reason it ranks first? And does this also mean that users were initially searching specifically for the company Kunstgras Online? Probably not. Or maybe some. And how do Google’s algorithms interpret such search behaviour?
Logical questions that are beyond the scope of this article.
More important is to deduce that it is more difficult to rank first for search terms containing ‘kunstgras online’.
However, it is possible to carry out on-page optimisations that will boost rankings for these search queries. For example, Kunstgras Direct can optimise the meta title, ‘Always greener than your neighbours lawn!’, which isn’t really helpful.
Screenshot: SERP ranking domain ‘kunstgrasdirect’
Also, there is no mention of ordering artificial grass online on the landing page.
KeyWI generates intuitive, actionable insights for content optimisation and content creation. It is fast, easy to use and above all data-driven.
The automatic semantic keyword clustering with KeyWI not only saves time, but also improves the quality of semantic clustering.
Other advantages of automatic keyword clustering with KeyWI:
- Use the tool for all languages, locations and regions. The tool is not contingent on language.
- Eliminate clustering errors
- Receive a set of clusters and subclusters within minutes
- Maximise the number of keywords to rank for
- No more irrelevant keywords in the same cluster
- Insight into user intent for both individual search terms and (sub)clusters
- Insight into the visibility of your domain for both keywords and (sub)clusters
1. In 2017 this method was dominantly used in SEO.
2. Other ranking factors such as URL or domain authority, internal links and backlinks contribute to the rankings too.
3. The example of kunstgrasdirect.nl is purely illustrative. Bartjan or KeyWI are not associated with kunstgrasdirect.nl
4. All print screens are in Dutch.
5. In dutch: ‘kunstgras leggen’
6. The first result is a featured snippet. The #2 above takes the original organic #1 position in Google.
7. ‘Zelf kunstgras leggen’ is dutch for ‘laying artificial grass yourself’
8. The automatic generation of actions is a feature currently in development.
9. Kunstgras online is dutch for ‘artificial grass online’
10. In 2012, Matt Cutts announced an algorithm change designed to reduce the number of low quality EMDs in search results. EMDs can still have a positive effect as long as sites have authority and create quality content.
I am Bartjan Sonneveld and founder of templatesseo.com. I work as an international SEO consultant & Freelancer across Europe for corporations and SMEs worldwide and help them with complicated technical matters and international strategy. I also help agencies and businesses to become better at data analyses for SEO purposes. I love getting funky with regular expressions, x-paths and queries in Google Sheet, which encouraged me to create numerous sheet templates to automate tedious tasks or improve the quality of data insights.