Computing principal eigenvectors of large web graphs: Algorithms and accelerations related to pagerank and hits
by Nagasinghe, Iranga, Ph.D., SOUTHERN METHODIST UNIVERSITY, 2010, 112 pages; 3403874

Abstract:

This thesis investigates and develops a few acceleration techniques for the search engine algorithms used in PageRank and HITS computations. PageRank and HITS methods are two highly successful applications of modern Linear Algebra in computer science and engineering. They constitute the essential technologies accounted for the immense growth and success of popular search engines present today, especially Google.

PageRank is a ranking method for the web pages based on the hyperlink structure of the world wide web. It is modelled as an eigenvalue problem, where the matrix related to the world wide web is huge, real, and nonsymmetric. Iterative procedures are employed to compute the principal eigenvector of the eigenproblem. The output upon convergence is a vector carrying the ranks of the web pages. The standard Power method is used for the iteration by the founders of PageRank. As is well-known, the Power method suffers from slow convergence if the gap-ratio of the eigenproblem is close to 1. To accelerate the convergence when the Power method converges slowly, we investigate a Rayleigh-Ritz based method, which we call the Rayleigh-SVD method, and an extrapolation method. For the Rayleigh-SVD method, we compute a singular vector related to the Rayleigh quotient that can contribute to faster convergence, instead of performing an eigen decomposition as in the standard Rayleigh-Ritz method. For the extrapolation based method, we truncate the terms that hinder the convergence of the Power method as a means for acceleration. A novel technique we develop is the combination of the Rayleigh-SVD method and the extrapolation method. Numerical experiments are presented to illustrate the speed up in convergence for the Rayleigh-SVD method, the extrapolation method, and a combination of the two.

HITS is another web ranking method similar to the idea of the PageRank model. It utilizes two vectors, the hub vector and the authority vector, instead of just one PageRank vector. Once again the Power method is the standard iterative procedure used in the computation of the HITS vectors. To accelerate the potentially slow convergence of the Power method, we develop a filtering method based on Chebyshev polynomials. The two matrices involved in the HITS model are real and symmetric, therefore real Chebyshev polynomials of the first kind are particularly useful and effective for the acceleration. We present numerical results that show the advantage of the filtered method over the Power method.

Several realistic web graphs by web crawlers (UbiCrawler) are used as test matrices for our algorithms. For the PageRank method, the nonsymmetric matrices are directly used to construct the PageRank model, while for the HITS method, symmetric matrices derived from the web graphs are used. Although the acceleration algorithms developed in this thesis are applied to search engine related models, they are applicable in many other fields where efficient computation of eigenvalues and eigenvectors of large matrices is necessary.

 
AdviserYunkai Zhou
SchoolSOUTHERN METHODIST UNIVERSITY
SourceDAI/B 71-06, p. , Jun 2010
Source TypeDissertation
SubjectsApplied mathematics; Information technology
Publication Number3403874
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3403874
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.