A core question for anyone on the Web, and certainly a question you should be asking if you’re trying to monetize your Web site, is how the heck does Google figure out what sites are more relevant to a given search than others?
To get the answer, let’s go back in time a little bit and look at the research papers from a Stanford University project called “BackRub”. You should certainly recognize the authors…
The BackRub project, of course, was done by two Stanford graduate students, Sergey Brin and Lawrence Page, and subsequently evolved into Google, the search site and company we all love and from which we all wish we had IPO stock.
Reading the early research reports is surprisingly informative, particularly The Anatomy of a Search Engine, in which Brin and Page explain that the fundamental idea behind Google is that for any given word or phrase, matching Web sites can be ranked for relevance by using something that they called pagerank.
Here’s what they have to say about this topic:
“The citation (link) graph of the web is an important resource that has largely gone unused in existing web search engines. We have created maps .. [that] allow rapid calculation of a web page’s “PageRank”, an objective measure of its citation importance that corresponds well with people’s subjective idea of importance. Because of this correspondence, PageRank is an excellent way to prioritize the results of web keyword searches. For most popular subjects, a simple text matching search that is restricted to web page titles performs admirably when PageRank prioritizes the results.”
Much more interesting than that, however, is the remarkably simple formula that they use to calculate pagerank in this first generation of Google, which is based almost completely on how many pages point to it.
Simple, but remarkably elegant: the more links that point to a given page, the more relevant that page must be. Further, take into account the words used to link to a site, and add the title tag of the page itself and you begin to have a pretty decent idea of the theoretical relevance and value of a given site.
If you like mathematical formulas, you’ll like this too:
We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one.
Of course, the modern Google pagerank algorithm has over 100 different variables that it takes into account, but the basic concept is still quite true: the more links you have pointing to you, the better your pagerank, and the better your pagerank, the more relevant your site will be for specific searches, and, finally, the more relevant, the higher you’ll show in the search results and the more traffic you’ll garner from Google searches!
There are a number of different ways to get more inbound links, as they’re called, to help boost your pagerank, but an even easier place for you to start if you’re eager to improve your own pagerank is to read my article entitled The hidden importance of your page TITLE.
Another good strategy: subscribe to my XML Feed with an RSS reader (learn more about RSS), and you’ll have my articles come to your computer without any further effort!