The Implementation of PageRank in the Google Search Engine
Regarding the implementation of PageRank, first of all, it is important how PageRank is integrated into the general ranking of web pages by the Google search engine. The proceedings have been described by Lawrencec Page and Sergey Brin in several publications. Initially, the ranking of web pages by the Google search engine was determined by three factors:
|Page specific factors|
|Anchor text of inbound links|
Page specific factors are, besides the body text, for instance the content of the title tag or the URL of the document. It is more than likely that since the publications of Page and Brin more factors have joined the ranking methods of the Google search engine. But this shall not be of interest here.
In order to provide search results, Google computes an IR score out of page specific factors and the anchor text of inbound links of a page, which is weighted by position and accentuation of the search term within the document. This way the relevance of a document for a query is determined. The IR-score is then combined with PageRank as an indicator for the general importance of the page. To combine the IR score with PageRank the two values are multiplicated. It is obvious that they cannot be added, since otherwise pages with a very high PageRank would rank high in search results even if the page is not related to the search query.
Especially for queries consisting of two or more search terms, there is a far bigger influence of the content related ranking criteria, whereas the impact of PageRank is mainly visible for unspecific single word queries. If webmasters target search phrases of two or more words it is possible for them to achieve better rankings than pages with high PageRank by means of classical search engine optimisation.
If pages are optimised for highly competitive search terms, it is essential for good rankings to have a high PageRank, even if a page is well optimised in terms of classical search engine optimisation. The reason therefore is that the increase of IR score deminishes the more often the keyword occurs within the document or the anchor texts of inbound links to avoid spam by extensive keyword repetition. Thereby, the potentialities of classical search engine optimisation are limited and PageRank becomes the decisive factor in highly competitive areas.
The PageRank Display of the Google Toolbar
PageRank became widely known by the PageRank display of the Google Toolbar. The Google Toolbar is a browser plug-in for Microsoft Internet Explorer which can be downloaded from the Google web site. The Google Toolbar provides some features for searching Google more comfortably.
The Google Toolbar displays PageRank on a scale from 0 to 10. First of all, the PageRank of an actually visited page can be estimated by the width of the green bar within the display. If the user holds his mouse over the display, the Toolbar also shows the PageRank value. Caution: The PageRank display is one of the advanced features of the Google Toolbar. And if those advanced features are enabled, Google collects usage data. Additionally, the Toolbar is self-updating and the user is not informed about updates. So, Google has access to the user's hard drive.
If we take into account that PageRank can theoretically have a maximum value of up to dN+(1-d), where N is the total number of web pages and d is usually set to 0.85, PageRank has to be scaled for the display on the Google Toolbar. It is generally assumed that the scalation is not linearly but logarithmically. At a damping factor of 0.85 and, therefore, a minimum PageRank of 0.15 and at an assumed logaritmical basis of 6 we get a scalation as follows:
|10/10||9,069,926.4||-||0.85 × N + 0.15|
It is uncertain if in fact a logarithmical scalation in a strictly mathematical sense takes place. There is likely a manual scalation which follows a logarithmical scheme, so that Google has control over the number of pages within the single Toolbar PageRank ranges. The logarithmical basis for this scheme should be between 6 and 7, which can for instance be rudimentary deduced from the number of inbound links of pages with a high Toolbar PageRank from pages with a Toolbar PageRank higher than 4, which are shown by Googe using the link command.
The Toolbar's PageRank Files
Even webmasters who do not want to use the Google Toolbar or the Internet Explorer permanently for security and privacy concerns have the possibility to check the PageRank values of their pages. Google submits PageRank values in simple text files to the Toolbar. In former times, this happened via XML. The switch to text files occured in August 2002.
The PageRank files can be requested directly from the domain www.google.com. Basically, the URLs for those files look like follows (without line breaks):
There is only one line of text in the PageRank files. The last cipher in this line is PageRank.
The parameters incorporated in the above shown URL are inevitable for the display of the PageRank files in a browser. The value "navclient-auto" for the parameter "client" identifies the Toolbar. Via the parameter "q" the URL is submitted. The value "Rank" for the parameter "features" determines that the PageRank files are requested. If it is omitted, Google's servers still transmit XML files. The parameter "ch" transfers a checksum for the URL to Google, whereby this checksum can only change when the Toolbar version is updated by Google.
Thus, it is necessary to install the Toolbar at least once to find out about the checksum of one's URLs. To track the communication between the Toolbar and Google, often the use of packet sniffers, local proxies an similar tools is suggested. But this is not necessarily needed, since the PageRank files are cached by the Internet Explorer. So, the checksums can simply been found out by having a look at the folder Temporary Internet Files. Knowing the checksums of your URLs, you can view the PageRank files in your browser and you do not have to accept Google's 36 years lasting cookies.
Since the PageRank files are kept in the browser cache and, thus, are clearly visible, and as long as requests are not automated, watching the PageRank files in a browser should not be a violation of Google's Terms of Service. However, you should be cautious. The Toolbar submits its own User-Agent to Google. It is:
Mozilla/4.0 (compatible; GoogleToolbar 1.1.60-deleon; OS SE 4.10)
1.1.60-deleon is a Toolbar version which may of course change. OS is the operating system that you have installed. So, Google is able to identify requests by browsers, if they do not go out via a proxy and if the User-Agent is not modified accordingly.
Taking a look at IE's cache, one will normally notice that the PageRank files are not requested from the domain www.google.com but from IP addresses like 184.108.40.206. Additionally, the PageRank files' URLs often contain a parameter "failedip" that is set to values like "220.127.116.11;1111" (Its function is not absolutely clear). The IP addresses are each related to one of Google's seven data centers and the reason for the Toolbar querying IP-addresses is most likely to control the PageRank display in a better way, especially in times of the "Google Dance".
The PageRank Display at the Google Directory
Webmasters who do not want to check the PageRank files that are used by the toolbar have another possibility to receive information about the PageRank of their sites by means of the Google Directory (directory.google.com).
The Google Directory is a dump of the Open Directory Project (dmoz.org), which shows the PageRank for listed documents similarly to the Google Toolbar display scaled and by means of a green bar. In contrast to the Toolbar, the scale is from 1 to 7. The exact value is not displayed, but it can be determined by the divided bar respectively the width of the single graphics in the source code of the page if one is not sure by looking at the bar.
By comparing the Toolbar PageRank of a document with its Directory PageRank, a more exact estimation of a pages PageRank can be deduced, if the page is listed with the ODP. This connection was mentioned first by Chris Raimondi (www.searchnerd.com/pagerank).
Especially for pages with a Toolbar PageRank of 5 or 6, one can appraise if the page is on the upper or the lower end of its Toolbar scale. It shall be noted that for the comparison the Toolbar PageRank of 0 was not taken into account. It can easily be verified that this is appropriate by looking at pages with a Toolbar PageRank of 3. However, it has to be considered that for a verification pages of the Google Directory respectively the ODP with a Toolbar PageRank of 4 or lower have to be chosen, since otherwise no pages linked from there with a Toolbar PageRank of 3 will be found.
The Effect of Inbound Links
PageRank and Google are trademarks of Google Inc., Mountain View CA, USA. PageRank is protected by US Patent 6,285,999.
The content of this document may be reproduced on the web provided that a copyright notice is included and that there is a straight HTML hyperlink to the corresponding page at pr.efactory.de in direct context.
(c)2002/2003 eFactory GmbH & Co. KG Internet-Agentur - written by Markus Sobek