Exploiting LSI to rank higher
Friday, April 27th, 2007
So your site is up and running and it’s the best thing since the invention of the VCR pause button, you’ve got SEO friendly site architecture, great content and some features your competition hasn’t. The only elusive element is that high-traffic phrase you’ve been trying to rank for. You’ve got loads of links with your keyword anchor text and it’s plastered on your site, so what the hell gives? Sound familiar?
Going a bit deeper into Google
I want to briefly go over something called Latent Semantic Indexing (LSI), which although it sounds like an incredibly boring and somewhat silly acronym, is actually really important. “Semantic” is the study of “language meaning”, so what LSI really stands for is “examining the potential meanings and connections of a load of different words, then putting them into a giant interconnected hierarchy and ranking system that wouldn’t fit on even a really big bit of paper”, which is why I shall refer to it as LSI again from now on.
The bottom line here is that Google spiders billions of blogs, news sites, documents and web pages, then crunches all of this textual data and tries to work out which words are related to each other, which words are related to a certain subject and have a stab at trying to work out the context in which words are being used.
“I don’t believe you, Google’s not that clever”
Here’s a neat trick. Try doing a search for a keyword in your niche and put a tilde (~) in front of the keyword. This will scratch the surface of what the LSI part of Google’s algorithm is doing.
Lets try a search for holidays

You can see Google has bolded the words Holidays and Flights. Google has worked out that the word “holidays” is related to the word “flights” (round of applause). You can continue going down this path, with a search for ~flights which will show you Google knows that the word “flights” is related to “fares” and so on.
You can imagine on the scale that Google retrieves data and the billions upon billions of pages it reads, it has quite enough data to make some fairly accurate calculations about the connections between words and specifically what other words it should be looking for when spidering a site on a specific subject.
Although it’s a hell of a lot more advanced than this, you can look at the co-occurrence of words, by seeing how many pages there are with keyword A, how many pages with keyword B and how many pages with both keyword A and B on.

So we can calculate the co-occurrence of the words “car” and “insurance” by doing: C / ((A+B)-C) [I'll let you do that]
What you’ll see is that the words “car” and “insurance” go together like carrots and peas, whereas;

“car” and “spoon” are not quite so happily married. If you interested in the real dirty maths behind co-occurrence I’d have a look at this.
How does knowing this boring stuff help me?
If you’re still pummelling links and optimising for your trophy term, Google is going to have your site for breakfast and you’ll be pooped out near the bottom of the SERPs. It is always worth bearing in mind that Google’s mission is to “deliver quality, relevant results” and this is one way they are trying to fish out people with bad link profiles and shaky content and it’s your job to stay one step ahead!
It is always best to build your rankings from “bottom up”, meaning you target all of the niche terms around your main trophy phrase, before you go charging in. Take the example that you’re building a travel advice website and you want to rank well for “travel advice”. Google knows what is related to travel such as hotels, resorts, tourism and culture content – step into the Googlebot’s shoes for a moment if you will:
Which site is more likely to hold more relevant information on the broad phrase “travel advice”
Site A: This site has 15,000 incoming links – 12,000 of which have the anchor text “travel advice”. They have a lot of mentions of the words “advice” and “travel”, with some mention of hotels and resorts – but not many links to verify this other content.
Site B: I already rank this site well for “best hotels in Europe” and “best travel insurance deals for Europe”, these pages have over 8000 incoming links, all with different travel related terms, so I can verify this is good content. This good content is related to the “travel advice” search, and the link profile looks more natural and they 7,000 incoming links for “travel advice”
Larger traffic keyterms tend to be a lot more generic in nature, so Google really has to kick in some AI and try and work out what the user is searching for – it does this by using data from the billions pages it has indexed. If you can get your head around how LSI is working, you can really lay a nice trap for Google and make it come to you, rather than you chasing it with hundreds of spammy links.
The sites I’ve had the best SEO success with are when I’ve started by aiming low, getting ranked for all the long tail terms I can pick up, then move onto the big boys after you’ve proved yourself to Google. You’ll find that grabbing these long-tail terms will also provide you with a higher quality of traffic, which some people tend to overlook when dashing after the big phrases.
So here’s a checklist:
- Have a think about your niche and try some keyword research tools to get variations
- Look at your competitors that are ranking well. What content do they have? What do they rank for?
- Have a play in Google using the tilde (~) to see where the big connections are and follow these breadcrumbs
- Try buying a few key phrases with AdWords and seeing how well they convert and accurately measuring what traffic they bring
- Vary your link building to specific pages so Google can get a grip of your content.
- Check for common mispellings (e.g. Google knows that “smileys” are the same as “smilies”)
- Blow the dust off the thesaurus!
Posted in Google, Research & Analytics, Search Engine Optimisation, White Hat | 14 Comments »