Home | Archive | SEO Tools | Contact

Exploiting LSI to rank higher

So your site is up and running and it’s the best thing since the invention of the VCR pause button, you’ve got SEO friendly site architecture, great content and some features your competition hasn’t. The only elusive element is that high-traffic phrase you’ve been trying to rank for. You’ve got loads of links with your keyword anchor text and it’s plastered on your site, so what the hell gives? Sound familiar?

Going a bit deeper into Google
I want to briefly go over something called Latent Semantic Indexing (LSI), which although it sounds like an incredibly boring and somewhat silly acronym, is actually really important. “Semantic” is the study of “language meaning”, so what LSI really stands for is “examining the potential meanings and connections of a load of different words, then putting them into a giant interconnected hierarchy and ranking system that wouldn’t fit on even a really big bit of paper”, which is why I shall refer to it as LSI again from now on.

The bottom line here is that Google spiders billions of blogs, news sites, documents and web pages, then crunches all of this textual data and tries to work out which words are related to each other, which words are related to a certain subject and have a stab at trying to work out the context in which words are being used.

“I don’t believe you, Google’s not that clever”
Here’s a neat trick. Try doing a search for a keyword in your niche and put a tilde (~) in front of the keyword. This will scratch the surface of what the LSI part of Google’s algorithm is doing.

Lets try a search for holidays


You can see Google has bolded the words Holidays and Flights. Google has worked out that the word “holidays” is related to the word “flights” (round of applause). You can continue going down this path, with a search for ~flights which will show you Google knows that the word “flights” is related to “fares” and so on.

You can imagine on the scale that Google retrieves data and the billions upon billions of pages it reads, it has quite enough data to make some fairly accurate calculations about the connections between words and specifically what other words it should be looking for when spidering a site on a specific subject.

Although it’s a hell of a lot more advanced than this, you can look at the co-occurrence of words, by seeing how many pages there are with keyword A, how many pages with keyword B and how many pages with both keyword A and B on.


So we can calculate the co-occurrence of the words “car” and “insurance” by doing: C / ((A+B)-C) [I'll let you do that]

What you’ll see is that the words “car” and “insurance” go together like carrots and peas, whereas;


“car” and “spoon” are not quite so happily married. If you interested in the real dirty maths behind co-occurrence I’d have a look at this.

How does knowing this boring stuff help me?
If you’re still pummelling links and optimising for your trophy term, Google is going to have your site for breakfast and you’ll be pooped out near the bottom of the SERPs. It is always worth bearing in mind that Google’s mission is to “deliver quality, relevant results” and this is one way they are trying to fish out people with bad link profiles and shaky content and it’s your job to stay one step ahead!

It is always best to build your rankings from “bottom up”, meaning you target all of the niche terms around your main trophy phrase, before you go charging in. Take the example that you’re building a travel advice website and you want to rank well for “travel advice”. Google knows what is related to travel such as hotels, resorts, tourism and culture content – step into the Googlebot’s shoes for a moment if you will:

Which site is more likely to hold more relevant information on the broad phrase “travel advice”

Site A: This site has 15,000 incoming links – 12,000 of which have the anchor text “travel advice”. They have a lot of mentions of the words “advice” and “travel”, with some mention of hotels and resorts – but not many links to verify this other content.

Site B: I already rank this site well for “best hotels in Europe” and “best travel insurance deals for Europe”, these pages have over 8000 incoming links, all with different travel related terms, so I can verify this is good content. This good content is related to the “travel advice” search, and the link profile looks more natural and they 7,000 incoming links for “travel advice”

Larger traffic keyterms tend to be a lot more generic in nature, so Google really has to kick in some AI and try and work out what the user is searching for – it does this by using data from the billions pages it has indexed. If you can get your head around how LSI is working, you can really lay a nice trap for Google and make it come to you, rather than you chasing it with hundreds of spammy links.

The sites I’ve had the best SEO success with are when I’ve started by aiming low, getting ranked for all the long tail terms I can pick up, then move onto the big boys after you’ve proved yourself to Google. You’ll find that grabbing these long-tail terms will also provide you with a higher quality of traffic, which some people tend to overlook when dashing after the big phrases.

So here’s a checklist:

  • Have a think about your niche and try some keyword research tools to get variations
  • Look at your competitors that are ranking well. What content do they have? What do they rank for?
  • Have a play in Google using the tilde (~) to see where the big connections are and follow these breadcrumbs
  • Try buying a few key phrases with AdWords and seeing how well they convert and accurately measuring what traffic they bring
  • Vary your link building to specific pages so Google can get a grip of your content.
  • Check for common mispellings (e.g. Google knows that “smileys” are the same as “smilies”)
  • Blow the dust off the thesaurus!

Like this article? Then subscribe to the feed!


Related Posts:


Next Post:
Who wants 60,600 free backlinks? »

Previous Post:

« Google bombing and on page factors

14 responses to “Exploiting LSI to rank higher”

  • Sam says:

    Great Article, been putting LSI at use for a while now.

    Comment by Sam
    April 27th, 2007 @ 4:20 am

  • Anders says:

    Nice one, thanks!

    It seems that this semantic search (~) only works in English. Well it does not work in Danish at least.
    Yet… :)

    Comment by Anders
    April 27th, 2007 @ 5:30 am

  • Colin Maddocks says:

    I found your explanation really interesting as I am having problems getting out Nile Cruise site on the first page on Google even though its’ a PR 1, has bags of content, changes regularly, etc, etc. I use Adwords for the main phrases we want to be found for, “Nile Cruise”, “Nile Cruise”, etc, but am getting nowhere. We do OK with “nile cruise bargains” (1st or 2nd page) but perhaps we need to follow your advice and go for long tail first. Thanks for the concise explanation.

    Comment by Colin Maddocks
    April 27th, 2007 @ 9:58 am

  • Mark says:

    That’s great, Colin. Keep us updated on your progress!

    Comment by Mark
    April 27th, 2007 @ 9:11 pm

  • Ray Dotson says:

    This is absolutely brilliant and well written, too. Great article!

    BTW, I found your blog by way of the Dosh Dosh article on your video blog instructions (also brilliant). Thanks for the great post. I’ll be back…

    Comment by Ray Dotson
    May 4th, 2007 @ 9:04 pm

  • Who here actually runs long term content websites? - Page 7 - WickedFire - Affiliate Marketing Forum - Internet Marketing Webmaster SEO Forum says:

    [...] a really good article on LSI Digerati Marketing » Exploiting LSI to rank higher __________________ You had a crush on Recessive Pat?! I had a crush on Recessive Pat!! We should [...]

    Comment by Who here actually runs long term content websites? - Page 7 - WickedFire - Affiliate Marketing Forum - Internet Marketing Webmaster SEO Forum
    May 28th, 2007 @ 1:20 am

  • Forumfads says:

    “How does knowing this boring stuff help me?” -LOL

    Otherwise great blog! LSI is definitly interesting subject nowadays.

    Comment by Forumfads
    July 7th, 2007 @ 12:41 pm

  • Mark says:

    Hehe thanks Forumfads, I try to put a little bit of humour into the posts, otherwise it can get a bit tedious! Thanks for the comment. Welcome to the digerati :)

    Comment by Mark
    July 7th, 2007 @ 4:43 pm

  • Harry says:

    “best thing since the invention of the VCR pause button”… They put a pause button on my VCR! I dont have to cross my legs during 4 hour movies… omg!

    Comment by Harry
    July 23rd, 2007 @ 9:10 pm

  • Mark says:

    Not quite what I was getting at, Harry…. Think harder ;-)

    Comment by Mark
    July 23rd, 2007 @ 9:17 pm

  • Spoof says:

    Great info, but i dont think the ~ search is still returning the same results

    Comment by Spoof
    August 8th, 2007 @ 12:37 pm

  • Mark says:

    Using the tilde returns the same results, but bold certain words.

    Comment by Mark
    August 9th, 2007 @ 9:04 am

  • How To Dupe Content And Get Away With It | MegaBlogg Free Blog Host says:

    [...] post. So I won’t talk too much about it, but definitely consider doing some research on exploiting LSI in this [...]

    Comment by How To Dupe Content And Get Away With It | MegaBlogg Free Blog Host
    August 30th, 2007 @ 6:56 pm

  • Alison says:

    Hey, a tip to help boost your site even more is HitTail.com. Along with the tips posted in this blog, and comments… I can’t wait to see my status rise in on the google page!

    Comment by Alison
    September 20th, 2007 @ 4:47 pm