Industry guru Dave Taylor answers free tech support questions about a wide variety of business and technical topics, including blogging, Google AdSense, MySpace, Sony PSP, Apple iPod, Mp3 players, management, Linux, SEO, Mac OS X, Facebook, Twitter, LinkedIn and Microsoft Windows.

How can I use robots.txt meta information to stop being spidered?

As far as I know, there's a big problem with duplicate content in the search engines, so I am paranoid about my dynamically generated content and want to use either a "robots.txt" file or a "robots.txt meta exclusion" to ensure that search engines don't see my content again and again. How do I do that?


Dave's Answer:

You are right to be concerned about duplicate content on your site, but only if you're doing something unusual or trying to squeeze extra pages out of your content. If you, for example, have blog entries in more than one category such that the information appears on its own "permalink" page, on a date-based archive page and on a category-based page, you should be okay. After all, that's very common information organization on a blog, for example, so it's hard to imagine that a search engine is going to penalize you for that organization.

Nonetheless, if you are concerned about your dynamically generated content, there are two ways you can teach the search engines (Google, Yahoo, MSN Live, etc) to skip those pages when building an index of your site.

The first, and perhaps easiest, is to create a robots.txt file, a process that is very well documented at RobotsTxt.org and also discussed on the Google site itself, starting with the overview article How do I use a robots.txt file to control access to my site?

Failing that, you can also add a specific meta tag on the top of each dynamically generated page that specifies it shouldn't be indexed or archived by search engines. The benefit of this approach, of course, is that it's probably quite easy to tweak the template page that's used to build these dynamic pages, so it might well be a 60-second fix.

The specific line should look like this:

<meta name="robots" content="noindex,nofollow">

The greater question of whether you really need to do this at all, that's something where you might want to read up on this useful article: Making your site search engine friendly, specifically his commentary about robots.txt and dynamic content.



Help others find this article at Del.icio.us, Digg, Netscape, Reddit, and Simpy.

Subscribe!

Never miss another useful Q&A article again! Subscribe to AskDaveTaylor with Google Reader.

Comments

I would suggest you to use your robots.txt file very carefully otherwise by mistake you may prevent spiders from crawling important pages of your site.
You can create it using any text editor. Just type in code and save it as robots.txt and just upload it onto your server : so simple...
It uses special code which is read by robots.
eg : Following code will disallow all spiders from indexing your site.
User-agent: *
Disallow: /

Posted by: Harry W. at January 6, 2007 3:59 AM

I have a lot to say, but ...
Starbucks coffee cup I have a lot to say, and questions of my own for that matter, but most of all I'd like to say thank you for all your efforts on this Web site by buying you a chai!

I do have a comment, now that you mention it!









Remember personal info?


Please note that I will never send you any unsolicited commercial email. Ever.

While I'm at it, please note that by submitting a question or comment you're agreeing to my terms of service, which are: you relinquish any subsequent rights of ownership to your material by submitting it on this site.









Search
Find just the answers you seek from among our 1700+ free tech support articles by using our Lijit search engine.


Help!





Subscribe to
Ask Dave Taylor!

Add to Google Reader
Add to My Yahoo!
Subscribe in NewsGator Online

RDF   XML

Free Updates!
Sign up and get free weekly updates and special offers on books, seminars, workshops and more.


Recent Entries
Join the List!
Join my author info mailing list, where you'll learn about my upcoming books, speaking gigs, and more!


Book Links
© 2002 - 2008 by Dave Taylor. All Rights Reserved.

Note: This web site is for the purpose of disseminating information for educational purposes, free of charge, for the benefit of all visitors. We take great care to provide quality information. However, we do not guarantee, and accept no legal liability whatsoever arising from or connected to, the accuracy, reliability, currency or completeness of any material contained on this web site or on any linked site.

[whiteboard marker tray]