How does Google figure out what pages are more relevant? Pagerank.

A core question for anyone on the Web, and certainly a question you should be asking if you’re trying to monetize your Web site, is how the heck does Google figure out what sites are more relevant to a given search than others?
To get the answer, let’s go back in time a little bit and look at the research papers from a Stanford University project called “BackRub”. You should certainly recognize the authors…

The BackRub project, of course, was done by two Stanford graduate students, Sergey Brin and Lawrence Page, and subsequently evolved into Google, the search site and company we all love and from which we all wish we had IPO stock.
Reading the early research reports is surprisingly informative, particularly The Anatomy of a Search Engine, in which Brin and Page explain that the fundamental idea behind Google is that for any given word or phrase, matching Web sites can be ranked for relevance by using something that they called pagerank.
Here’s what they have to say about this topic:

“The citation (link) graph of the web is an important resource that has largely gone unused in existing web search engines. We have created maps .. [that] allow rapid calculation of a web page’s “PageRank”, an objective measure of its citation importance that corresponds well with people’s subjective idea of importance. Because of this correspondence, PageRank is an excellent way to prioritize the results of web keyword searches. For most popular subjects, a simple text matching search that is restricted to web page titles performs admirably when PageRank prioritizes the results.”

Much more interesting than that, however, is the remarkably simple formula that they use to calculate pagerank in this first generation of Google, which is based almost completely on how many pages point to it.
Simple, but remarkably elegant: the more links that point to a given page, the more relevant that page must be. Further, take into account the words used to link to a site, and add the title tag of the page itself and you begin to have a pretty decent idea of the theoretical relevance and value of a given site.
If you like mathematical formulas, you’ll like this too:

We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one.

Impressive, eh?
Of course, the modern Google pagerank algorithm has over 100 different variables that it takes into account, but the basic concept is still quite true: the more links you have pointing to you, the better your pagerank, and the better your pagerank, the more relevant your site will be for specific searches, and, finally, the more relevant, the higher you’ll show in the search results and the more traffic you’ll garner from Google searches!
There are a number of different ways to get more inbound links, as they’re called, to help boost your pagerank, but an even easier place for you to start if you’re eager to improve your own pagerank is to read my article entitled The hidden importance of your page TITLE.
Another good strategy: subscribe to my XML Feed with an RSS reader (learn more about RSS), and you’ll have my articles come to your computer without any further effort!

About the Author: Dave Taylor has been involved with the online world since the early days of the Internet. Author of over 20 technical books, he runs the popular AskDaveTaylor.com tech help site. You can also find his gadget reviews on YouTube and chat with him on Twitter as @DaveTaylor.

6 comments on “How does Google figure out what pages are more relevant? Pagerank.”

Ask Dave Taylor! says:

November 3, 2004 at 8:36 pm

How do I map XX.com to http://www.XX.com in Apache?

I’ve been keeping an eye on a very interesting discussion about the difference between the Google pagerank of pages on a “www” domain name and the same page without the “www” prefix (e.g. “www.intuitive.com/index.html” versus “intuitive.com/index.html”…

Dave Taylor's Booktalk says:

November 2, 2004 at 5:30 am

How do I map XX.com to http://www.XX.com in Apache?

Dave Taylor says:

August 30, 2004 at 3:14 am

Agreed, but I imagine it’s just a matter of time before people are trying to disassemble and reverse-engineer MSN’s BlockRank too. And we’ll write about that here at http://www.free-web-money.com/ too. 🙂

Anthony Parsons says:

August 30, 2004 at 2:17 am

MSN’s BlockRank algo will be a little more interesting than PageRank IMO, once they have it up and running.

aaron wall says:

June 12, 2004 at 2:13 am

PageRank actually is not weighted very heavily in Google’s current ranking algorithm. When they talk about PageRank all they are talking about is the equasion you listed above.

When they talk about 100 factors in their ranking algorithm it is a bit more complex than just PageRank.

Currently the #1 most important ranking factor for competitive terms is inbound link text.

that is why mr vi-agra and mr phen-tamine have probably visited your site a few times (I know they have mine)…

The Intuitive Life says:

June 1, 2004 at 3:35 am

Learn more about Google in the Houston Business Review

Alright, the article is actually one that I originally wrote for Free Web Money with the cheery title of How does Google figure out which pages are more relevant? Pagerank., but I’m pleased that the Houston Business Review picked it up for their May 20…

6 comments on “How does Google figure out what pages are more relevant? Pagerank.”

Leave a Reply Cancel reply