
How can I use robots.txt meta information to stop being spidered?As far as I know, there's a big problem with duplicate content in the search engines, so I am paranoid about my dynamically generated content and want to use either a "robots.txt" file or a "robots.txt meta exclusion" to ensure that search engines don't see my content again and again. How do I do that? You are right to be concerned about duplicate content on your site, but only if you're doing something unusual or trying to squeeze extra pages out of your content. If you, for example, have blog entries in more than one category such that the information appears on its own "permalink" page, on a date-based archive page and on a category-based page, you should be okay. After all, that's very common information organization on a blog, for example, so it's hard to imagine that a search engine is going to penalize you for that organization. Nonetheless, if you are concerned about your dynamically generated content, there are two ways you can teach the search engines (Google, Yahoo, MSN Live, etc) to skip those pages when building an index of your site. The first, and perhaps easiest, is to create a robots.txt file, a process that is very well documented at RobotsTxt.org and also discussed on the Google site itself, starting with the overview article How do I use a robots.txt file to control access to my site? Failing that, you can also add a specific meta tag on the top of each dynamically generated page that specifies it shouldn't be indexed or archived by search engines. The benefit of this approach, of course, is that it's probably quite easy to tweak the template page that's used to build these dynamic pages, so it might well be a 60-second fix. The specific line should look like this: <meta name="robots" content="noindex,nofollow"> The greater question of whether you really need to do this at all, that's something where you might want to read up on this useful article: Making your site search engine friendly, specifically his commentary about robots.txt and dynamic content.
Help others find this article at Del.icio.us, Digg, Netscape, Reddit, and Simpy.
Categorized:
HTML and CSS
(Article 7114)
Tagged: google, msn live, robots.txt, seo, yahoo Previous: What's the history of "Auld Lang Syne"? Next: How do I have additional info pop up on href links? Subscribe!
Never miss another useful Q&A article again! Subscribe to AskDaveTaylor with Google Reader. I would suggest you to use your robots.txt file very carefully otherwise by mistake you may prevent spiders from crawling important pages of your site. I have a lot to say, but ...
I do have a comment, now that you mention it!
|
Search
Find just the answers you seek from among our 1700+ free tech support articles by using our Lijit search engine.
Help!
Subscribe to
Ask Dave Taylor!
Free Updates!
Sign up and get free weekly updates and special offers on books, seminars, workshops and more.
Articles and Reviews
Auctions and Online Shopping Blogs and RSS Feeds Building Web site traffic Business and Management Cell Phones and Mobile Phones CGI Scripts and Web Site Programming Computer and Internet Basics d) None of the Above HTML and CSS Mac OS X Help MySpace, Facebook, Twitter and Social Network Help Pay Per Click (PPC) Search Engine Optimization Shell Script Programming Sony PSP, MP3 Players, Etc. The Writing Business Unix and Linux Help Video Game Tips and Help Windows Help
Recent Entries
Join the List!
Book Links
|