Industry guru Dave Taylor offers tech support on technical and business topics, including iPhone, iPod, Microsoft Windows, Sony PSP, cellphones, online advertising, CSS, Web design, business, Unix, Linux, SEO, Mac OS X, and shell script programming.     


What's the deal with Google Sitemaps?

Dave, I just read about Google having some sort of sitemap feature that lets you ensure that their spider sees all of your pages. I kinda wonder whether this is a good thing or not. What have you been hearing about it?


Dave's Answer:

I admit, I haven't had much time to dig into exactly what it is and what it means, but I am lucky that my colleague Michael Motherwell, head of the Austrailian WMS Consulting, shared his own thoughts on this new Google feature. Here's what he had to say:



The goal of SiteMaps is simple: save costs, improve their product, generate buzz.

As background, the single hardest part of a search engine is the crawl scheduler. Try to think of a programme that permanently runs, has to schedule 6 billion documents for crawling, with some crawled every five to ten minutes, some once a month and many inbetween, with multiple rules for NOT crawling pages and a list to crawl that grows every second haphazardly, all utilising thousands of servers and, well, my head spins.

The key to better crawling has always been finding all the URLs to crawl faster and more accurately. That is why a site map is recommended. A list of all URLs to crawl has all the URLs in the scheduler sooner. Problem there is that now we have 8 million page sites (Amazon and eBay) and millions of sites.

As an example of the problems this causes, if a site has 100,000 pages, how many pages need to be crawled before a search engine knows about them all? The bottom 10,000 pages of such a site will probably be linked to once from a cateory page itself several links deep. That means Google need to crawl many levels of a site before finding all the pages, and that means to find the web's freshest and newest content, like, say, a new printer, Google need to crawl deep and often. That is a heck of a lot more pages to crawl to find all URLs on a site than the 1 XML file SiteMaps offers, that is for sure.

No more!

While the old way was bandwidth intensive and, therefore, costly to both sides, now when Google arrives at a site, it has a nice neat list of all pages and, if people are smart, all NEW pages, and starts with the new and works down. SiteMaps has the potential to reduce costs for Google and site owners without sacrificing quality or freshness.

How good is that? What this means for site owners is that rather than a top down crawl, starting at the Home Page, we might see bottom then top then middle, e.g. specific products, home page, categories. No more "I have 10,000 pages and only 50 are indexed". No more Robots all over a site constantly. Now sites will have closer to 100% of pages indexed, and new pages indexed sooner, with a fresher refresh date for changing content. Assumming, of course, this all works out...

It also means that rather than Google crawling daily / weekly / monthly 100,000 pages of large sites, they can crawl less often with more certainty. That, in my book, equals less bandwidth expense per page for Google, less bandwidth expense overall for both sides (the W3C standards page that never changes will get crawled once ina blue moon, as they can set their recrawl to "Never") and a fresher, more complete index.

if sites get this right (and they have every reason to), they can now start to tell Google which pages have changed. Have you had a product recalled? Put it in your "new pagesd sitemaP" and it will get updated sooner. Product no longer sold? Ditto.

This last point makes for a better Google index, IMHO. Finding changed, deep content, and new, fresh content sooner is fantastic alround! I don't know about anyone else, but if I were Google, being better is something I would want, not least of which because the PR potential is huge and Gates is crying about Google number of tricks and taking them out of the picture.

As an aside, many moons ago, GoogleGuy made a comment @ Webmasterworld joking that "I put a page up 7 seconds ago and Google hasn't found it - What is going on?" At the time, I thought "funny, how can Google ever get to that stage? To do so would need a recrwal rate of every page every five seconds, or Webmaster help". Well, here is that help! Google should know about a site's newest pages ASAP, and webmasters can get timely content in faster and easier. Win (Google) - Win (website owner) - win (searchers).

So, my $0.02 (for what that is worth): no consipiracies, no ulterior moives, just good old fashioned business logic. "Reduced costs == increased profits, a better product == market leadership and better PR, innovation == keeping Microsoft in third place". Personally, I wish *I* had an idea for my business that reduced my costs, improved my products AND generated solid PR and plenty of press. I can't imagine anyone would nee any more motivation for an idea than that.

So, rather than look for the cloud behind this silver lining, I reckon we give it a collective go, and see what happens. A world in which webmasters and crawlers are partners rather than enemies is good for all. Personally, the only issue I have is that, in true US tradition (sorry, had to chuck that in) this is a uni-lateral exercise. I just wish, as with the nofollow link attribute, that Google had consulted the W3C and that this was a ratified standard that haold multi-vendor buy in.



If you want to learn more about Google Sitemaps, here's the main What is Google Sitemaps page, and here's a page that covers Frequently Asked Questions. Hope those help illuminate this interesting topic!


More Useful Search Engine Optimization (SEO) Articles:
✔   How do I restructure my Wordpress blog without losing SEO?
I have a wordpress blog that was using categories in the url structure like this: /category_name/post_name/ Then I had read somewhere that if...
✔   Change in Web site navigation drops PageRank to zero?
We changed our website recently and also changed the menus on it so some of our pages have a different path now. For...
✔   How do I implement add rel=author for Google serps?
I notice that sometimes on the Google search results that the author of the article is displayed next to the match in the...
✔   Add a new site to my Google Webmaster Tools account?
Thanks for your earlier blog entry about how to verify my account with Google Webmaster Tools! Now I want to add an additional...
✔   How do I verify my site with Google Webmaster Tools?
I've heard that one of the best ways to improve my Google search ranking is to fix anything that's marked as incorrect in...

Let's stay in touch!
Sign up for my weekly AskDaveTaylor Newsletter and you'll receive even more tech and gadget help right to your inbox, along with exclusive news and industry updates. It's good stuff. I promise!
    Enter your name: and your email addr:  









Reader Comments To Date: 2

Sarah King said, on June 9, 2005 3:52 AM:

A huge benefit you won't see many people talking about - but should be is ... that you no longer have to trade off a sites usability with it's spiderability.

I find myself having to code 2 paths to information on a client site so that they can have their cake and eat it. If the XML feed is taken up by all SEs then this separates the users into two camps.

Ofcourse it's also very cheeky by putting the onus onto the webmaster to have and maintain the XML feed.

It's been along time coming! Freshbot was canned a long time ago and their technique for identifying fresh content (using the server headers) is defunct because dynamic content tends not to set them. This gives the spiders a resource friendly (for both the webmaster and Google) way of identifying what's new and what isn't.

Anand Betanabhotla said, on March 25, 2006 12:00 AM:

Great Site. I have a question, please. I need to display latest science news on my site from rss feeds without server side code. How can I do this? All rss to javascript codes seem to have some server side functionality.

Starbucks coffee cup I do have a lot to say, and questions of my own for that matter, but first I'd like to say thank you, Dave, for all your helpful information by buying you a cup of coffee!

I do have a comment, now that you mention it!











I will never send you any unsolicited email. Ever.






Check This Out Too...

 
Look for Answers
Need Help? Ask Dave Taylor!


Follow Me on Pinterest

Find Me on Google+
ADT on G+
© 2002 - 2013 by Dave Taylor. All Rights Reserved.

Note: This web site is for the purpose of disseminating information for educational purposes, free of charge, for the benefit of all visitors. We take great care to provide quality information. However, we do not guarantee, and accept no legal liability whatsoever arising from or connected to, the accuracy, reliability, currency or completeness of any material contained on this web site or on any linked site. Further, please note that by submitting a question or comment you're agreeing to my terms of service, which are: you relinquish any subsequent rights of ownership to your material by submitting it on this site. My lawyer says "Thanks".
"Ask Dave Taylor®" is a registered trademark of Intuitive Systems, LLC.