Saturday, August 19, 2006

Small Insight on the Googlebot

Most people dont really know the inner workings of the Google Bot, or how it travels and sees your site. Vanessa Fox posted some very helpfull Q & A (questions and answers) about the google bot.

If my site is down for maintenance, how can I tell Googlebot to come back later?
You should configure your server to return a status of 503 (network unavailable) rather than 200 (successful). That lets Googlebot know to try the pages again later.

And this is usefull info cause ive seen this happen a couple of times when im searching for forums.

Is it better to use the meta robots tag or a robots.txt file?
Googlebot obeys either, but meta tags apply to single pages only. If you have a number of pages you want to exclude from crawling, you can structure your site in such a way that you can easily use a robots.txt file to block those pages (for instance, put the pages into a single directory).

I cant even remember how many times people have asked this, even I did when I was starting my first website. This is the best tip she could give to make people stop asking!

What The Googlebot Wont Index
This is your robots.txt file.

User-agent: *
Disallow: /

User-agent: Googlebot
Disallow: /cgi-bin/

Googlebot will crawl everything in the site other than pages in the cgi-bin directory.
For this robots.txt file:

User-agent: *
Disallow: /

Googlebot won't crawl any pages of the site.

Now this is helpfull info if you have a Forum and maybe you set up a Staff Forum or a private forum for members that donated or something, and you dont want the bot to index that part, because eif it does when they click the link they will get a message that says, "You dont have permission to access this page"

So these are just some tips on how to control the Googlebot in your site.

Vanessa Fox's Entry | More Info... | Tools For Webmasters

