|
How can I use robots.txt meta information to stop being spidered?As far as I know, there's a big problem with duplicate content in the search engines, so I am paranoid about my dynamically generated content and want to use either a "robots.txt" file or a "robots.txt meta exclusion" to ensure that search engines don't see my content again and again. How do I do that? You are right to be concerned about duplicate content on your site, but only if you're doing something unusual or trying to squeeze extra pages out of your content. If you, for example, have blog entries in more than one category such that the information appears on its own "permalink" page, on a date-based archive page and on a category-based page, you should be okay. After all, that's very common information organization on a blog, for example, so it's hard to imagine that a search engine is going to penalize you for that organization. Nonetheless, if you are concerned about your dynamically generated content, there are two ways you can teach the search engines (Google, Yahoo, MSN Live, etc) to skip those pages when building an index of your site. The first, and perhaps easiest, is to create a robots.txt file, a process that is very well documented at RobotsTxt.org and also discussed on the Google site itself, starting with the overview article How do I use a robots.txt file to control access to my site? Failing that, you can also add a specific meta tag on the top of each dynamically generated page that specifies it shouldn't be indexed or archived by search engines. The benefit of this approach, of course, is that it's probably quite easy to tweak the template page that's used to build these dynamic pages, so it might well be a 60-second fix. The specific line should look like this: <meta name="robots" content="noindex,nofollow"> The greater question of whether you really need to do this at all, that's something where you might want to read up on this useful article: Making your site search engine friendly, specifically his commentary about robots.txt and dynamic content.
Categorized:
HTML and CSS
(Article 7114,
Written by Dave Taylor)
Tagged: google, msn live, robots.txt, seo, yahoo Previous: What's the history of "Auld Lang Syne"? Next: How do I have additional info pop up on href links? Subscribe!
I would suggest you to use your robots.txt file very carefully otherwise by mistake you may prevent spiders from crawling important pages of your site. I have something to say, now that you mention it, but ...
I do have a comment, now that you mention it!
|
Recommended
Recent Entries
Search
I Need Help!
Apple iPad Help
Articles and Reviews Auctions and Online Shopping Blogs and RSS Feeds Building Web Site Traffic Business and Management CGI Scripts and Web Site Programming Computer and Internet Basics d) None of the Above Facebook Help Google Plus Help HTML and CSS Industry News and Trade Shows iPhone and Cell Phone Help iPod, Sony PSP and MP3 Player Help Mac OS X Help Pay Per Click (PPC) Advertising Search Engine Optimization (SEO) Shell Script Programming Tech Support Video Help The Writing Business Twitter, LinkedIn and Social Network Help Unix and Linux Help Video Game Tips and Help Windows PC Help WordPress Help |