What's the deal with Google Sitemaps?
Dave, I just read about Google having some sort of sitemap feature that lets you ensure that their spider sees all of your pages. I kinda wonder whether this is a good thing or not. What have you been hearing about it?
I admit, I haven't had much time to dig into exactly what it is and what it means, but I am lucky that my colleague Michael Motherwell, head of the Austrailian WMS Consulting, shared his own thoughts on this new Google feature. Here's what he had to say:
The goal of SiteMaps is simple: save costs, improve their product, generate buzz.
As background, the single hardest part of a search engine is the crawl scheduler. Try to think of a programme that permanently runs, has to schedule 6 billion documents for crawling, with some crawled every five to ten minutes, some once a month and many inbetween, with multiple rules for NOT crawling pages and a list to crawl that grows every second haphazardly, all utilising thousands of servers and, well, my head spins.
The key to better crawling has always been finding all the URLs to crawl faster and more accurately. That is why a site map is recommended. A list of all URLs to crawl has all the URLs in the scheduler sooner. Problem there is that now we have 8 million page sites (Amazon and eBay) and millions of sites.
As an example of the problems this causes, if a site has 100,000 pages, how many pages need to be crawled before a search engine knows about them all? The bottom 10,000 pages of such a site will probably be linked to once from a cateory page itself several links deep. That means Google need to crawl many levels of a site before finding all the pages, and that means to find the web's freshest and newest content, like, say, a new printer, Google need to crawl deep and often. That is a heck of a lot more pages to crawl to find all URLs on a site than the 1 XML file SiteMaps offers, that is for sure.
While the old way was bandwidth intensive and, therefore, costly to both sides, now when Google arrives at a site, it has a nice neat list of all pages and, if people are smart, all NEW pages, and starts with the new and works down. SiteMaps has the potential to reduce costs for Google and site owners without sacrificing quality or freshness.
How good is that? What this means for site owners is that rather than a top down crawl, starting at the Home Page, we might see bottom then top then middle, e.g. specific products, home page, categories. No more "I have 10,000 pages and only 50 are indexed". No more Robots all over a site constantly. Now sites will have closer to 100% of pages indexed, and new pages indexed sooner, with a fresher refresh date for changing content. Assumming, of course, this all works out...
It also means that rather than Google crawling daily / weekly / monthly 100,000 pages of large sites, they can crawl less often with more certainty. That, in my book, equals less bandwidth expense per page for Google, less bandwidth expense overall for both sides (the W3C standards page that never changes will get crawled once ina blue moon, as they can set their recrawl to "Never") and a fresher, more complete index.
if sites get this right (and they have every reason to), they can now start to tell Google which pages have changed. Have you had a product recalled? Put it in your "new pagesd sitemaP" and it will get updated sooner. Product no longer sold? Ditto.
This last point makes for a better Google index, IMHO. Finding changed, deep content, and new, fresh content sooner is fantastic alround! I don't know about anyone else, but if I were Google, being better is something I would want, not least of which because the PR potential is huge and Gates is crying about Google number of tricks and taking them out of the picture.
As an aside, many moons ago, GoogleGuy made a comment @ Webmasterworld joking that "I put a page up 7 seconds ago and Google hasn't found it - What is going on?" At the time, I thought "funny, how can Google ever get to that stage? To do so would need a recrwal rate of every page every five seconds, or Webmaster help". Well, here is that help! Google should know about a site's newest pages ASAP, and webmasters can get timely content in faster and easier. Win (Google) - Win (website owner) - win (searchers).
So, my $0.02 (for what that is worth): no consipiracies, no ulterior moives, just good old fashioned business logic. "Reduced costs == increased profits, a better product == market leadership and better PR, innovation == keeping Microsoft in third place". Personally, I wish *I* had an idea for my business that reduced my costs, improved my products AND generated solid PR and plenty of press. I can't imagine anyone would nee any more motivation for an idea than that.
So, rather than look for the cloud behind this silver lining, I reckon we give it a collective go, and see what happens. A world in which webmasters and crawlers are partners rather than enemies is good for all. Personally, the only issue I have is that, in true US tradition (sorry, had to chuck that in) this is a uni-lateral exercise. I just wish, as with the nofollow link attribute, that Google had consulted the W3C and that this was a ratified standard that haold multi-vendor buy in.
If you want to learn more about Google Sitemaps, here's the main What is Google Sitemaps page, and here's a page that covers Frequently Asked Questions. Hope those help illuminate this interesting topic!
More Useful Search Engine Optimization (SEO) Articles:
✔ How do I restructure my Wordpress blog without losing SEO?
I have a wordpress blog that was using categories in the url structure like this: /category_name/post_name/ Then I had read somewhere that if...✔ Change in Web site navigation drops PageRank to zero?
We changed our website recently and also changed the menus on it so some of our pages have a different path now. For...✔ How do I implement add rel=author for Google serps?
I notice that sometimes on the Google search results that the author of the article is displayed next to the match in the...✔ Add a new site to my Google Webmaster Tools account?
Thanks for your earlier blog entry about how to verify my account with Google Webmaster Tools! Now I want to add an additional...✔ How do I verify my site with Google Webmaster Tools?
I've heard that one of the best ways to improve my Google search ranking is to fix anything that's marked as incorrect in...
Let's stay in touch!
Sign up for my weekly AskDaveTaylor Newsletter and you'll receive even more tech and gadget help right to your inbox, along with exclusive news and industry updates. It's good stuff. I promise!
I do have a comment, now that you mention it!
Check This Out Too...
Look for Answers
All Our Categories
Apple iPad Help
Articles and Reviews
Auctions and Online Shopping
Blogs and Blogging
Building Web Site Traffic
Business and Management
Computer and Internet Basics
d) None of the Above
Google Gmail Help
Google Plus Help
Industry News and Trade Shows
iPhone and Cell Phone Help
iPod, Sony PSP and MP3 Player Help
Kindle Fire Help
Mac OS X Help
Pay Per Click (PPC) Advertising
Search Engine Optimization (SEO)
Shell Script Programming
Tech Support Video Help
The Writing Business
Twitter, LinkedIn and Social Network Help
Unix and Linux Help
Video Game Tips and Help
Windows PC Help
Find Me on Google+
ADT on G+