AdSense and robots.txt (Part 4)

Time to finish this series. Refer to Part 1, Part 2 and Part 3 if you're just joining us.

Testing Your robots.txt File

Until you're pretty sure about what you're doing (and even then) you should always test your robots.txt file before deploying it to your web server. This ensures that you don't accidentally block a crawler from pages you want crawled. This actually happened to me not long ago when I accidentally blocked the Mediabot from my Google Suggest Explorer. Oops.

There are many robots.txt validators available, but the one you want to use is Google's. Why? Because it supports Google's robots.txt extensions, so you'll know for sure if the Mediabot (or the AdWords crawler) will be allowed on the right pages or not.

Google's validator is found in the Google Webmaster Tools (GWT), a free set of tools for webmasters and blog owners. If you've created a Google Sitemap, you'll already be familiar with GWT — see Google Sitemaps 101 for more information. GWT is free, all you need is a Google account.

To use the validator, you'll need to add the site you want validated to the list of sites registered with GWT. You don't even have to verify your ownership of the site to use the validator, so just go ahead and enter the URL of your site in the “Add Site” box at the top of the page. Once the site's been added, click on the “Manage http://……” link to get to a page that looks like this:

Click on the “robots.txt analysis” link to bring up the validator:

The page above is the GWT entry for my debt-free living site. As you can see, it shows you when the robots.txt file for that site was last read. This particular site doesn't have a robots.txt file. If it did, the contents of the file would be shown in the text box.

It doesn't matter whether or not a robots.txt file was found, you can just take the contents of your updated robots.txt file and paste them into that text box. You then scroll down to the next box and enter a list of URLs you'd like checked:

You scroll down again and select which Google crawlers you want to check the file against:

And then you press the Check button. The URLs you listed are checked against the rules listed in the first text box.

You'll have to use another validator if you want to check non-Google crawlers, but for AdSense publishers this validator is often all they need to use.

So this ends our look at the robots.txt file. Now I need to apply what I just described to the pet fence guide and get the Googlebot to look at just one copy of the content while letting the Mediabot through to all the pages on the site.

Sponsored Link: Learn more about the ins and outs of
AdSense by reading Uncommon AdSense, my latest book about AdSense.

Eric Giguere is the author of Uncommon AdSense and the award-nominated (that just means it lost!) blog Make Easy Money with Google and AdSense.

Tags

Comments

5 Responses to “AdSense and robots.txt (Part 4)”

  1. Satish Talim on June 29th, 2007 1:12 am

    I don’t have robots.txt on my site. Is it mandatory?

  2. Eric Giguere on June 29th, 2007 5:15 am

    No. The robots.txt file is used primarily to keep search engines out of certain areas/pages of your site. If you don’t have a need to do so, you don’t need a robots.txt file. If there’s no file, the search engines assume that they can crawl every page they find, unless the page itself has a meta tag indicating otherwise.

  3. Satish Talim on June 29th, 2007 7:39 am

    So far I had no robots.txt. Reading this post, I used GWT to find more info on robots.txt analysis. I saw that the default text shown of the robots.txt file was the contents of my index.html file and the status said that the file was in error. I have now kept an empty robots.txt. Let’s see what it has to say.

  4. Eric Giguere on June 29th, 2007 9:49 am

    If you had no robots.txt, nothing would have prevented Google from indexing your site UNLESS you had a “noindex” declaration in the home page’s meta tag.

    Is your website configured to redirect requests for missing/non-existent files (i.e. a 404 error) to your home page? That would explain what you saw.

  5. Google AdWords Case Study: Site Design : Make Easy Money With Google And AdSense on September 11th, 2007 12:12 pm

    [...] they won’t be very different from each other. Don’t wet your pants! If it worries you, exclude your landing pages from the search engines using your robots.txt file: that’s what it’s for. Just be sure, however, to let the [...]

Subscribe without commenting