Saturday, May 31st, 2008
Right, lets kick this thing in the nuts. Wouldn’t it be great if you could have a decent list of SEO Ranking Factors and more specifically, tell me exactly what you need to rank for a key phrase?
Well, SEOMoz went and done this.
You’ve probably all seen it before, the famous SEOMoz Search Ranking Factors, the highly regarded opinions of 37 leaders of search spread over a bunch of questions. It sounds slick, it looks cool and it’s a great introduction to SEO. There is, however, a rather major problem. None of them pissing agree! 37 leaders in search, closed ended questions, yet almost ALL of the answers have only “average agreement”, just look at the pie charts at the end, there is massive dispute between the correct answer.
I find this interesting. It leaves two possibilities
1) SEOMoz’s questions are flawed and there is no “correct” answer – this kind of kills the whole point of the project.
2) If there is a “correct” answer, then it would seem that 25%-50% of “leading people in search” don’t know WTF they are talking about.
Now before I continue, I’m not going to claim I have all the answers, far, far from it. I do some stuff and that stuff works well for me. The other thing I would like to point out is that I actually really like the SEOMoz blog and I think they provide extremely high quality content in high frequency, which is bloody hard to do. So please no flaming when I seem to be bashing their hard work, I’m simply pointing out a few things rather crudely. Oh, they’re nice people too, Jane is very polite when I stalk her on Facebook IM.
Anyway, back to slating. I think it is very hard to give quality answers to questions such as, how does page update frequency effect ranking? From my experience, I’ve found Google quite adaptive in knowing, based on my search query, whether it should serve me a “fresh” page or one that’s collecting dust. Eli from BlueHatSEO has also made some convincing arguments that the “optimum” update frequency of a page depends on your sector/niche.
Also, these things change. Regulary. Those clever beardies at Google are playing with those knobs and dials all the time. Bastards.
Okay, I now hate you for slating SEOMoz, do you have anything useful to say?
Maybe? Maybe not. As I mentioned in my last post, I’m going to talk about some projects I’m working on at the moment and one of these is specifically aimed at getting some SEO Ranking Factors answers.
I could of course just give what I believe to be the “correct” answers to the SEO Ranking Factors questions, but like everyone else, I’d be limited to my own SEO experience. We need more data, more testing, more evidence.
There’s loads of little tools floating around the net that will tell you little things like, if you have duplicate meta descriptions, your “keyword density” (hah), how many links you have, all that stuff. Then you’ll get some really helpful advice like “ShitBOT has detected your keyword only 3.22% on this page, you should mention your keyword 4.292255% for optimum Googleness”. Yes, well. Time to fuck off ShitBOT. These tools are kind of fragmented over the net, so it would take ages to run all 101 to build up a complete “profile” of your website, which really… Wouldn’t tell you all that much. It wouldn’t tell you much because you’re only looking at your own website, your own ripples in the pond. You need to zoom out a bit, get in a ship and sail back a bit, then maybe put your ship in a shuttle, blast off until you can see the entire ocean.
Well, crap. It all looks different from here..
Creating a Technological Terror
I can’t do this project alone. Fortunately, one of the smartest SEO people I know moved all the way across the country to my fine city and is going to help.
Here we go….
1) Enter the keyword you would like to rank for.
2) We will grab the top 50 sites in Google for this search term.
2) i) First of all, we will do a basic profile of these sites, very similar, but a bit more depth than the data SEOQuake will give you. So things like domain age, number of sites linking to domain, how these links are spread within the site, page titles, amount of content, update frequency, PageRank etc. We’ll also dig a bit deeper and take titles and content from pages that rank for these key phrases and store them for later.
2) ii) The real work begins here. For each one of these sites that rank, we are going to look at the second tier, which I don’t see many people doing. We are going to analyse all of the types of sites that link to these sites that rank well. This will involve: Doing the basics, such as looking at their vital stats, so their PR, links, age of domain, TLD and indexed pages.
Then we’re going to take this a step further. We are going to be scanning for footprints to work out the type of link. This means, is it an image link? Is it a link from a known social news site like Digg or Reddit? Is it a link from a social bookmarking site like StumbleUpon or Delicious? Is it a link from a blog? Is it a link from a forum? A known news site? Is it a link from a generic content page? If so, lets use some language processing and try and determine if it’s a link from a related content page, or a random ringtones page. Cache all of this data.
3) We have a huge amount of data now, we need to process it. Ranking for the keyterm casino, lets put it onto a graph showing their actual ranking for this keyterm vs their on page vital stats. Lets see the ranking vs the types of links they have. Lets see how the sites rank vs the amount of links, the age of links etc.etc…
4) We can take this processing to any level needed. Lets pool together all the data we have of the 50 sites and take averages. What do they have in common for this search term? Are these common ranking factors shared between totally different niches and keywords?
This is the type of information that I think I know. I think it would be valuable to know the information I know (=
So I guess you can expect a lot of playing with the Google Charts API, scatter graphs showing link velocity against domain age and total links and all that shit.
You get the idea.
There’s actually all other kind of secondary analysis that can be pumped into this data. For instance, even though it’s a kind of made up term, I think “TrustRank” has some sauce behind it. (There’s a good PDF on TrustRank here). Lets think of it in very, very simple, non-mathematical terms for a moment.
One fairly basic rule of thumb for the web can be that a trusted (“good”) site will generally not link to a “bad” (spam, malware, crap) site. It makes sense, generally very high quality websites vet the other sites that they link to. So it makes sense that Google select a number of “seed” sites and give them a special bit of “trust” juice, which says that whatever site this one links to, is very likely to be of good quality. This trend continues down the chain, but obviously the further down this chain you get, the more and more likely it is that this rule will be broken and someone (maybe even accidentally) will link to what Google considers a “bad” website. For this reason, the (and I use this terminology loosely) “Trust” that is passed on will be dampened at each tier. This allows a margin for calculated error, so if they chain in essence is broken, the algorithm maintains its quality, because it allows for this.
I think most people could name some big, trusted websites. Why not take time to research these sites, really trusted authority sites – one’s that it’s at least a fair bet has some of this magical Trust? Say we have a list of ten of these sites, why not crawl them and get a list of every URL that they link to? Why not then crawl all of these URLs and get a list of all the sites THEY link to? Why not grab the first 3 or 4 “tiers” of sites? Great now, you’ve probably got a few million URLs. Why not let Google help us? Lets query this URLs against the keywords we’re targeting. What you’re left with is a list of pages from (hopefully) trusted domains, that are related to your niche. The holy grail of whitehat link building. Now pester them like a bastard for links! Offer content, blowjobs, whatever it takes!
Wouldn’t it be interesting if we took this list of possible Trusted sites and tied in this theory with how many of our tendrials of trusted networks link to our high-ranking pages? There’s a lot of possibilities here.
This project will be taking up a significant chunk of my time over the next months. Maybe the data will be shit and we won’t find any patterns and it will be a giant waste of time. At least then I can say with confidence that SEO is actually just charm-glasping, pointy hat-wearing, pole chanting black art that so many businesses seem to think it is. At least I’ll be one step closer to finding out.
Apologies once again to SEOMoz if you took offense. I love you x