Industry guru Dave Taylor offers free tech support on a wide variety of technical and business topics, including HTML, Apple iPhone, online advertising, Cascading Style Sheets, Web design, management, Unix, Linux, search engine optimization, online dating, Mac OS X, shell script programming and Microsoft Windows.

How can I use robots.txt meta information to stop being spidered?

As far as I know, there's a big problem with duplicate content in the search engines, so I am paranoid about my dynamically generated content and want to use either a "robots.txt" file or a "robots.txt meta exclusion" to ensure that search engines don't see my content again and again. How do I do that?


Dave's Answer:

You are right to be concerned about duplicate content on your site, but only if you're doing something unusual or trying to squeeze extra pages out of your content. If you, for example, have blog entries in more than one category such that the information appears on its own "permalink" page, on a date-based archive page and on a category-based page, you should be okay. After all, that's very common information organization on a blog, for example, so it's hard to imagine that a search engine is going to penalize you for that organization.

Nonetheless, if you are concerned about your dynamically generated content, there are two ways you can teach the search engines (Google, Yahoo, MSN Live, etc) to skip those pages when building an index of your site.

The first, and perhaps easiest, is to create a robots.txt file, a process that is very well documented at RobotsTxt.org and also discussed on the Google site itself, starting with the overview article How do I use a robots.txt file to control access to my site?

Failing that, you can also add a specific meta tag on the top of each dynamically generated page that specifies it shouldn't be indexed or archived by search engines. The benefit of this approach, of course, is that it's probably quite easy to tweak the template page that's used to build these dynamic pages, so it might well be a 60-second fix.

The specific line should look like this:

<meta name="robots" content="noindex,nofollow">

The greater question of whether you really need to do this at all, that's something where you might want to read up on this useful article: Making your site search engine friendly, specifically his commentary about robots.txt and dynamic content.



Help others find this article at Del.icio.us, Digg, Netscape, Reddit, and Stumble Upon    

Subscribe!

Never miss another useful Q&A article again! Subscribe to AskDaveTaylor with Google Reader.

Comments

I would suggest you to use your robots.txt file very carefully otherwise by mistake you may prevent spiders from crawling important pages of your site.
You can create it using any text editor. Just type in code and save it as robots.txt and just upload it onto your server : so simple...
It uses special code which is read by robots.
eg : Following code will disallow all spiders from indexing your site.
User-agent: *
Disallow: /

Posted by: Harry W. at January 6, 2007 3:59 AM

I have a lot to say, but ...
Starbucks coffee cup I have a lot to say, and questions of my own for that matter, but most of all I'd like to say thank you for all your efforts on this Web site by buying you a chai!

I do have a comment, now that you mention it!











Remember personal info?


Please note that I will never send you any unsolicited commercial email. Ever.

While I'm at it, please note that by submitting a question or comment you're agreeing to my terms of service, which are: you relinquish any subsequent rights of ownership to your material by submitting it on this site.









Uniblue: Free Virus Scan

Follow me on Twitter @DaveTaylor

Search
Find just the answers you seek from among our 2300+ free tech support articles by using our Lijit search engine.


Help!





Subscribe to
Ask Dave Taylor!

Add to Google Reader
Add to My Yahoo!
Subscribe in NewsGator Online

RDF   XML

Free Updates!
Sign up and get free weekly updates and special offers on books, seminars, workshops and more.


Recent Entries
Book Links
© 2002 - 2009 by Dave Taylor. All Rights Reserved.

Note: This web site is for the purpose of disseminating information for educational purposes, free of charge, for the benefit of all visitors. We take great care to provide quality information. However, we do not guarantee, and accept no legal liability whatsoever arising from or connected to, the accuracy, reliability, currency or completeness of any material contained on this web site or on any linked site.

[whiteboard marker tray]
"Ask Dave Taylor®" is a registered trademark of Intuitive Systems, LLC.