Lost Feedburner Subscribers?
January 21, 2009
Anyone else having problems with Feedburner? Overnight, I’ve (apparently) dropped from ~1200 subscribers to ~500. A quick Google search shows lots of people reporting the same problem.
Anyone know what’s going on – or had a similar experience?
Like this article? Then subscribe to the feed!Related Posts:
Posted by Mark at 2:05pm
7 Comments »
SEO Tips For Business and More
January 13, 2009
As long time readers will know, I’ve mainly used Digerati as a platform to publish articles on the more techy and blackhat side of SEO. This may not suit everyone, but it’s really something that interested me enough to post from a personal perspective.
Those of you that really pay attention will know I’m a director at Further Search Marketing. Further have just put their new website live, which we’re making inroads to have a really decent SEO Blog.
If you’ve enjoyed my writing style and content so far, it may be worth you subscribing to the Further SEO Blog, as I’ll be posting there more regularly than on Digerati.
I’ve already done what I would consider two, “good” posts:
Banned in Google? The Complete Guide: This article is aimed for those who have suffered a penalty or ban in Google and how to get listed again as quickly as possible.
The Google Sitelinks Guide – This is (as far as I’m aware) one of the most detailed posts on Google Sitelinks, how they work, how you can get them and what effect they seem to have on your site.
The posts on Further will be more business orientated, with less technical focus, but just as much detail (but less swearing).
So, if you’ve been an avid reader of Digerati, you can get more on your browser or in your reader.
Do it (:

Like this article? Then subscribe to the feed!
Related Posts:
Posted by Mark at 4:24pm
6 Comments »
CURL Page Scraping Script
December 16, 2008
Using cURL and page scraping for specific data is one of the most important things I do when creating databases. I’m not just talking about scraping pages and reposting here, either.
You can use cURL to grab the HTML of any viewable page on the web and then, most importantly take that data and pick out the bits you need. This is the basis for link analysis scripts, training scripts, compiling databases from sources around the web, there’s almost limitless things you can do.
I’m providing a simple PHP class here, which will use cURL to grab a page then pull out any information between user specified tags, into an array. So for instance, in our example you can grab all of the links from any web page.
The class is quite simple – I had to get rid of the lovely indententation to make it fit nicely onto the blog, but it’s fairly well commented.
In a nutshell, it does this:
1) Goes to specified URL
2) Uses cURL to grab the HTML of the URL
3) Takes the HTML and scans for every instance of the start and end tags you provide (e.g. < a > < / a >)
4) Returns these in an array for you.
Download taggrab.class.zip
html = ""; $this->binary = 0; $this->url = ""; } // takes url passed to it and.. can you guess? function fetchPage($url) { // set the URL to scrape $this->url = $url; if (isset($this->url)) { // start cURL instance $this->ch = curl_init (); // this tells cUrl to return the data curl_setopt ($this->ch, CURLOPT_RETURNTRANSFER, 1); // set the url to download curl_setopt ($this->ch, CURLOPT_URL, $this->url); // follow redirects if any curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, true); // tell cURL if the data is binary data or not curl_setopt($this->ch, CURLOPT_BINARYTRANSFER, $this->binary); // grabs the webpage from the internets $this->html = curl_exec($this->ch); // closes the connection curl_close ($this->ch); } } // function takes html, puts the data requested into an array function parse_array($beg_tag, $close_tag) { // match data between specificed tags preg_match_all("($beg_tag.*$close_tag)siU", $this->html, $matching_data); // return data in array return $matching_data[0]; } } ?>
So that is your basic class, which should be fairly easy to follow (you can ask questions in comments if needed).
To use this, we need to call it from another PHP file to pass the variables we need to it.
Below is tag-example.php which demonstrates how to pass the URL, start/end tag variables to the class and pump out a set of results.
Download tag-example.zip
"; // Make a title spider $tspider = new tagSpider(); // Pass URL to the fetch page function $tspider->fetchPage($urlrun); // Enter the tags into the parse array function $linkarray = $tspider->parse_array($stag, $etag); echo "Links present on page: ".$urlrun."
"; // Loop to pump out the results foreach ($linkarray as $result) { echo $result; echo "
"; } ?>
So this code will pass the Techcrunch website to the class, looking for any standard a href links. It will then simply echo these out. You could use this in conjunction with SearchStatus Firefox Plugin to quickly see what links Techcrunch is showing bots and what they are following and nofollowing.
You can view a working example of the code here.
As I said, there’s so much you can do from a base like this, so have a think. I might post some proper tutorials on extracting data methodically, saving it to a database then manipulating it to get some interesting results.
Enjoy.
Edit: You’ll of course need cURL library installed on your server for this to work!
Like this article? Then subscribe to the feed!Related Posts:
Posted by Mark at 2:43pm
21 Comments »
Blogs Worth Reading
December 15, 2008
I’ve never done a round-up of the blogs I read before, which I guess is a bit selfish. So, in no particular order (and this isn’t a complete list) some of my favourite blogs, if you’re looking for some inspiration.
Dark SEO Programming is run by Harry. As he puts it, “SEO Tools. I make ‘em”. A great guy if you need help with coding and somewhat of a captcha guru, with a sense of humour. Definitely worth keeping up with. I wouldn’t be surprised if this guy starts making big Google waves in the next few years.
Ask Apache is a blog I absolutely love. Great, detailed tutorials on script optimisation, advanced SEO and mod_rewrite. AskApache’s blog posts are the kind of ones that live in your bookmarks, rather than your RSS Reader.
Andrew Girdwood is a great chap from BigMouthMedia I met last year (although I very much doubt he remembers that). Andrew seems to be a vigilante web bug hunter. What I like about his blog is that he is usually the first to find weird things with Google that are going down. This usually gets my brain rolling in the right direction of my next nefarious plan. ^_^
Blackhat SEO Blog run by busin3ss is always worth checking out. He was even kind enough to give me a pre-release copy of YACG mass installer to review (it’s coming soon – I’m still playing!). Apart from his excellent tools, his blog features the darker side of link building, which of course, interests me greatly.
Kooshy is a blog run by a guy I know, who.. Well I think he wants to remain anonymous (at least a little). He’s just got started again after closing down his last blog and moving Internet personas (doesn’t the mystery just rivet you?). Anyway, get in early, I think we can expect some good stuff from here. He’s already done a cool post on Pimpin’ Duplicate Content For Links.
Jon Waraas is run by.. Can you guess? Jon has something that a lot of even really smart Internet entrepreneurs are missing, good old fashioned elbow grease. This guy is a workaholic and it pays off in a big way. Apart from time saving posts on loads of different ways to monetise your site, build backlinks and flush out your competitors I get quite a lot of inspiration for his constant stream of effort and ideas. I could definitely take a leaf out of his work ethic book.
Blue Hat SEO is becoming one of the usual suspects really. If you’re here, you probably already know about Eli. Being part of my “let’s only do a post every few months club”, I love Eli’s blog because there is absolutely no fluff. He gets straight down to the business of overthrowing Wikipedia, exploiting social media and answering specific SEO questions. You’ll struggle to find higher quality out there.
SEO Book is probably the most “famous” blog I’m going to mention here. Aaron was off at a disadvantage, because to be honest, I thought he was a massive waste of space for quite a while. (I guess that’s what happens when you take your SEO youth on Sitepoint listening to the people with xx,xxx posts on there). I bought his SEO Book and for me, at least, it was way too fluffy. I’m pleased he’s started an SEO training service now as it represents much better value. I’m sure he was making a lot of money from his SEO Book, but perhaps milked it too long (like I probably would have). Anyway, I kept with his blog and I’ve been impressed with his attitude and posts. He’s done some really cool stuff, like the SEO Mindmap and more recently, a keyword strategy flowchart which would be useful for those looking to a more structured search approach. He’s also written about algorithm weightings for different types of keywords and of course has some useful SEO Tools.
Slightly Shady SEO – Great name, great blog. Although XMCP will probably take it as an insult, I’ve always regarded Slightly Shady as the blog most similar to mine on this list. Maybe it’s because I wish I’d written some of the posts he has, before he did, hehe. Again, a no BS approach to effective SEO, whether he’s writing about Google’s User Data Empire, hiding from it or site automation it’s all gravy.
The Google Cache is a great blog for analytical approaches to SEO. There are some awesome posts on Advanced Whitehat SEO and using proxies with search position trackers. I like.
SEOcracy is run by a lovely database overlord called Rob. Rob’s a cool guy, he was kind enough to donate some databases to include in the Digerati Blackbox a while back. Most of his databases are stashed away in his content club now, which is well worth a look in. He’s also done some enlightening posts on keyword research, stuffing website inputs and Google Hacking.
This is all I’ve got time for now, apologies if I’ve missed you. There may be a Part II in the near future.
Like this article? Then subscribe to the feed!Related Posts:
Posted by Mark at 2:43pm
7 Comments »
Understanding Optimum Link Growth
December 12, 2008
Good evening all and Merry Christmas to all those who celebrate this time of year (you Pagans, you!). Rather than sit around the fire talking about yesteryear and smashing whiskey glasses into the fire, I’d like to talk to you about the far more interesting subject of link growth.
Link Growth on The Intertubes
For the context of this conversation (and by that I mean one-way lecture), I am assuming that everyone is defining link growth at the rate at which a domain as a whole and specific pages gain new backlinks. More importantly, how quickly search engines discover and “count” these backlinks.
I’ve blogged before about link velocity before and generally summerised that it was of course, a factor in how well your website ranks. However, as with most SEO topics, the devil is in the detail and there’s a lot of myths about the detail. So I would like to discuss:
1) What signals do “good” links and “notsogood” links give to your website?
2) How does domain age and your current backlink count play a part in determining your “optimal” link velocity?
3) Can you be harmed by incoming links?
These are what I believe are some of the most important (it’s definitely not all) factors attributing to link growth / velocity. As I want to have this blog post finished by Christmas, I’m going to try and stick around these core 3 points, although I’m sure I’ll end up running off at a tangent like I usually do. If however, you think I’ve missed something critical, drop me a comment and I’ll see if I can do a followup.
The difference between trust & popularity
When talking about links, it’s important to realise that there is a world of difference between a signal of trust and a signal of popularity. They are not mutually exclusive and to rank competitively, you’ll need signals of both trust and popularity, but for now realising they are different is enough.

For instance: Michael Jackson is still (apparently) very popular, but you wouldn’t trust him to babysit your kids now, would you? The guy down the road in your new neighbourhood might be the most popular guy in your street, but you’re not going to trust him until someone you know well gives him the thumbs up.
So for your site to rank well, Google needs to be able to have a degree of trust (e.g. source of incoming links, domain age, site footprints) to ensure your not just another piece of 2 bit webscum and it needs to know your content is popular (i.e. good content, link velocity, types of links). As I’ve already said, I’m not going to get into a drawn out debate about content here, just looking at links.
What comes first, trust or popularity?
It doesn’t really make much logical sense that you’ll launch a website and with no fanfare, you get a stream of hundreds of low quality links every week.
This kind of sits well with the original plan of the PageRank algorithm, which let’s not forget is actually (originally) trying to calculate the chance that a random surfer clicking around the web will bump into your site. This notion of a random surfer, clicking random links gave Google an excellent abstract to work out the whole “page authority” that the lion’s share of their algorithm sprang from.
Nowadays, you’ll hear lots of people trumping about going after quality (i.e. high PR links) rather lots of “low quality” (low PR links) while trying to remain relevant. From the algorithm origins point of view, the higher PR pages simply have more of these virtual random surfers landing on them; so more chance of a random surfer clicking your link.
Looking back at “time zero” when the PageRank started to propagate around the web, apart from internal PR stacking, all sites were equal, so PageRank was actually collected by raw numbers of links, rather than this “quality” (high PR) angle, which is actually just a cumulative effect of the PageRank algorithm (at least in its original form).
Hopefully, you’re still with more or not bored about going over fundementals, but without this level of understanding you’ll have a job getting your head around the more advanced concepts of link growth. Keep in mind here, I’m talking about pure PageRank in its original form (I’m sure it’s been updated since it was published), I’m not talking about ranking factors as a whole. To be honest, when I’m ranking websites (which I’m pretty good at), PageRank normally plays a very, very small role in my decision making, it is however useful as an abstract concept when planning linking strategies.
The point I’ve been eluding to here is, for Google to buy into the fact that yes your site is getting lots of natural “run of the mill” links, you firstly will need links from higher PageRank pages (or authorative pages, which are slightly different – bare with me). This line of thinking is of course assuming you don’t use a product like Google Analytics – (“Googlebot: Hmm, 58 visitors per month and 1,200 new incoming links per month, makes perfect sense!”).
Google is also pretty good at identifying “types” of websites and marrying this up to trust relationships. So for instance, I think most people would like a link from the homepage of the BBC News website, it’s a whopping PR9 and has bucket loads of trust. Here’s a question though: Is it a “relevant” link? The BBC News website covers a massive variety of topics, as most news sites do, so what is relevant and what is not is pretty much dependent on the story, which of course cover all topics. Does a link from the BBC News site mean your site is “popular”? No, (although it might make it so). Here’s a good question to ask yourself, between these two scenerios which is most believable:
1) Brand new site launched :: Couple of links from small blogs :: Gets 2,000 links in first month
2) Brand new site launched :: 1 linked from BBC News Homepage :: Gets 2,000 links in first month
Of course, you’ve hopefully identified situation 2 as the far more likely candidate. Lets consider what Google “knows” about the BBC website:
Googlebot says:
1) I know it’s a news website (varied topics)
2) I know millions of other sites link to it (it’s incredibly popular)
3) Lots of people reference deep pages (the content is of great content)
4) I see new content hourly as well as all the syndicated content I’m tracking (Fresh – as a news site should be)
5) It’s been around for years and never tried to trick me (another indicator of trust)
6) If they link to somebody, they are likely to send them lots of traffic (PR)
7) if they link to somebody, I can pretty much be sure I can trust this person they link to
Despite its critics, I’m a big believer in (at least some kind of) TrustRank system. It makes perfect sense and if you haven’t read the PDF, it’s very much worth doing so. In a hat tip to critics, it is incredibly hard to prove because of the dynamic nature of the web, it is almost impossible to seperate the effects of PageRank, relevance, timing, content and a myriad of other glossary terms you could throw at any argument. However, without leaps of faith, no progress would be made as we’re all building on theory here.
Site Note: While I’m talking about experimentation and proof, I’m still chipping away at my SEO Ranking Factors project (albeit slower than I like) and I’ll be willing to share some scripts for “tracking TrustRank” in the new year – dead useful stuff.
Okay, the point I’m making here is that these high trust/authority whatever you want to call them, sites are a stepping stone to greater things. I would agree with the whitehat doctrine that yes (if it’s your own domain at least) you will require links from these sources if you are to rank well in the future. We’ll look at some examples of how to rank without those links later (:
Trust needs to come before mass popularity and there are other things you may want to consider apart from just scanning websites and looking for as much green bar as possible. There are other mechanisms, which while I don’t believe Google is using to the full extent they should (even when they play around with that godamn WikiSearch – musn’t get started on that).
So looking from a Wikinomics aspect, they are less trustworthy but being on the front page of Digg, being popular in Stumble, having lots of delicious bookmarks could all be signals of trust as well as popularity (although at the moment at least, they are easier to game). I would expect, before Google can use these types of signals as strong factors of search, there will need to be more accountability (i.e. mass information empire) for user accounts. This is perhaps one of the things that could make WikiSearch work, being linked to your Google Account, Google can see if you use Gmail, search, docs, video, blogger, analytics, the list goes on – it’s going to be much harder to create “fake” accounts to boost your popularity.
Domain age and link profiles
Domain age definitely has its foot in the door in terms of ranking, however having an old domain doesn’t give you a laminated backstage pass to Google rankings. The most sense you’re going to get out of looking at domain age comes with overlaying it with a link growth profile, which is essentially the time aspect of your link building operation.
Your natural link growth should have an obvious logical curve when averaged out, probably something like this:

Which roughly shows that during a natural (normalised) organic growth, the amount of links you gain per day/month/week will increase (your link velocity goes up). This is an effect of natural link growth, discovery and more visitors to your site. Even if you excuse my horrific graph drawing skills, the graph is pretty simplified.
How does this fit into link growth then?
I’ll be bold and make a couple of statements:
1) When you have established trust, even the crappiest of crap links will help you rank (proof to come)
2) The more trustage (that’s my new term for trust over time (age)) the greater “buffer” you have for building links quickly
Which also brings us to two conclusions:
3) Straying outside of this “buffer zone” (i.e. 15,000 low quality new links on week 1) can you see penalised.
4) If you’ve got great trust you can really improve your rankings just by hammering any crap links you like at the site.
So, going along with my crap-o-matic graphs:

As I’ve crudely tried to demonstrate in graphical form, your “buffer zone” for links increases almost on a log scale, along with your natural links. Once you’ve established a nice domain authority, it’s pretty much free game with links, within reason.
I s’pose you’re going to want some proof for all these wild claims, aren’t you?
Can incoming links harm your website?
The logical answer to this would be “no”. Why would Google have a system in place that penalises you for bad incoming links? If Google did this, they would actually make their job of ranking decent pages much harder, with SEOs focusing in damaging the competition, rather than working on their own sites. It would be a nightmare, with a whole sub-economy of competitor disruption springing up.
That’s the logical answer. Unfortunately, the correct answer is yes. I’ll say it again for the scan readers:
It is possible to damage the rankings of other websites with incoming links
Quote me if you like.
Now by “bad links” I don’t mean the local blackhat viagra site linking to you, that will most likely have absolutely no effect whatsoever. Those kind of sites which Google class “bad neighbourhood” can’t spread their filth by just linking to you, let’s be clear on that. You’re more at risk if someone tricks you into linking to a bad site with some kind of Jedi mind trick.
There’s two ways I’ve seen websites rankings damaged by incoming links:
1) Hopefully this one is obvious. I experienced this myself after registering a new domain, putting a site up 2 days later – which ranked great for the first couple of weeks. Then, well.. I “accidently” built 15,000 links to it in a single day. Whoops. I never saw that site in the top 100 again.
2) There is a reliable method to knocking pages out of the index, which I’ve done (only once) and seen others do many, many times. Basically, you’re not using “bad” links as such, by this I mean not from dodgy/blackhat or banned sites, they are links from normal sites. If for instance, you find a sub-page of a website ranking for a term, say “elvis t-shirts” (this is a random term, I don’t even know what the SERPs are for this term) with 500 incoming links to that page. If you get some nice scripts and programs (I won’t open Pandora’s Box here – if you know what I’m talking about then great) and drop 50,000 links over a 2 week period with the anchor text “buy viagra”, you’ll find quite magically you have totally screwed Google’s relevancy for that page.
I’ve seen pages absolutely destroyed by this technique, going from 1st page to not ranking in the top 500 – inside of a week. Pretty powerful stuff. You’ll struggle with root domains (homepages) but sub-pages can drop like flies without too much problem. Obviously, the younger the site the easier this technique is to achieve.
You said you could just rank with shoddy links?
Absolutely true. Once you’ve got domain authority, it’s pretty easy to rank with any type of link you can get your hands on, which means blackhat scripts and programs come in very useful. To see this in effect, all you have to do is keep your eye on the blackhat SERPs. “Buy Viagra” is always a good search term to see what the BHs are up to. It is pretty common to see Bebo pages, Don’t Stay In pages – or the myriad of other authorative domains with User Generated Content rank in the top 10 for “Buy Viagra”. If you check out the backlink profiles of these pages you will see, surpise, surprise, they are utter crap low qualtiy links.
The domains already have trust and authority – all the need is popularity to rank.
Trust & Popularity are two totally different signals.
Which does your site need?
We have learnt:
1) You can damage sites with incoming links
2) Trust & Authority are two totally different things – Don’t just clump it all in as “PageRank”
3) You can rank pages on authority domains with pure crap spam links (:
Good night.
Like this article? Then subscribe to the feed!Related Posts:
Posted by Mark at 12:11am
18 Comments »