The Google Bot is a strange character, he can be a friendly sort of bot, but if you send him down the wrong path or a dead end, he can do a bit of damage to your sites ranking, now I can imagine you are thinking that this is impossible, well I’ll explain.
We have already spoken about robots.txt and how it is used to stop bots from visiting places we don’t want it to.
Doing this actually creates another problem, especially for your sites SEO, the pages that you put into robots.txt are linked to places that you want the bots to go, and naturally bots will want to follow those links, to find out where they go, like all bots should.
The problem is that every time a link leads to a dead end ( robots.txt) it produces and records what Google terms a soft 404 error, or as it’s better known “File not found”. At this point you would think that the bots would recognise that they are being blocked by a robots.txt file, unfortunately they can’t, so we, the admin have to come to their rescue again.
What we need to do is set up a redirect for each of the URL’s in robots.txt, now before you get to work on that, think, if you have done your robots.txt properly, you could have over 700 URL’s in there, and setting up a redirect for all them, while not impossible, would be a little tedious, thankfully we have another weapon in our arsenal, it’s called .htaccess, that file that everyone told you to stay away from, well I’m telling you not to, in fact .htaccess is a very powerful tool for the admin and SEO.
The first thing to note is that it has no extention, so we know it’s not an html, css or php file, neither is it a script file, so what is it?
The “.” is in fact a command for the server, it tells the server to read the file, the server will read whatever is written in the file, this is because it carries server side instructions that you are warned away from it, we need to use this to our advantage.
If your site doesn’t have a .htaccess file, you need to create one, and that is easier than it sounds, some forum and blog software is even programmed to write to .htaccess to save you a job.
Open up your text editor and create a new blank file, save it as .htaccess, some editors will require you to use quotes, to tell the software it is a .htaccess files, some will do it anyway, if it saves it as a html file you need to save it again as “.htaccess” (include the quotes).
Still in your text editor you need to go to line 1 and write the following :
ErrorDocument 404 /notfound.html
save it and upload it to your root folder via FTP.
What this command does is alter the way the 404 error works, it tells the server that instead of going to the standard 404 error the host has set, to instead switch to a page called notfound.html
Now there is another problem, you don’t have a file on your system called “notfound.html, so you have to create one.
The page should include both your normal header and footer, the more it looks like a page from your site the better it is. It should aslo include at least one link back to the site, that is obvious for the user, the bot will find it anyway.
Most forum or site platforms have a means of creating new pages, if you don’t know how to do it, there are tutorials on the internet.
The one for my site looks like this :
Luckily for me the platform I use has a plugin to help you create new pages.
Once you have created your page upload it to your root file, now we need to get the true URL of the 404 error page, to do this type in a url that you know doesn’t exist on your site, the URL that it goes to will be in the browser bar, everything after the last trailing slash has to be added before the notfound.html in the .htaccess file, eg:
If the URL of the 404 page is http://www.yoursite.com/misc.php?, then the command in .htaccess should look like this :
ErrorDocument 404 /misc.php?page=notfound.html
Now using FTP upload the edited .htaccess file to your root, to test if it is working, try entering a URL that doesn’t exist again, this time you should get your friendly page pop up instead of the default 404 page from the host.
Now when a bot hits the robots.txt dead end, instead of it reporting a soft 404 error, it follows one of the links you have set up on the unfound.html page, and carries on crawling your site.
Doing this will stop Google listing your page in SERPS under “Page Not Found” !