Measurement of online social networks
by Gjoka, Mina, Ph.D., UNIVERSITY OF CALIFORNIA, IRVINE, 2010, 149 pages; 3432219

Abstract:

In recent years, the popularity of online social networks (OSN) has risen to unprecedented levels, with the most popular ones having hundreds of millions of users. This success has generated interest within the networking community and has given rise to a number of measurement and characterization studies, which provide a first step towards their understanding. The large size and access limitations of most online social networks make it difficult to obtain a full view of users and their relations. Sampling methods are thus essential for practical estimation of OSN properties.

Key to OSN sampling schemes is the fact that users are, by definition, connected to one another via some relation. Therefore, samples of OSN users can be obtained by exploring the OSN social graph or graphs induced by other relations between users. While sampling can, in principle, allow precise inference from a relatively small number of observations, this depends critically on the ability to collect a sample with known statistical properties. An early family of measurement studies followed Breadth-First-Search (BFS) type approaches, where all nodes of a graph reachable from an initial seed were explored exhaustively. In this thesis, we follow a more principled approach: we perform random walks on the social graph to collect uniform samples of OSN users, which are representative and appropriate for further statistical analysis.

First, we provide an instructive comparison of different graph exploration techniques and apply a number of known but perhaps underutilized methods to this problem. We show that previously used BFS-type methods can produce biased samples with poor statistical properties when the full graph is not covered, while randoms walks perform remarkably well. We also demonstrate how to measure online convergence for random walk-based approaches. Second, we propose multigraph sampling, a novel technique that performs a random walk on a combination of OSN user relations. Performed properly, multigraph sampling can improve mixing time and yield an asymptotic probability sample of a target population even where no single connected relation on the same population is available. Third, we apply the presented methods to collect some of the first known unbiased samples of large scale OSNs. An important part of this collection is the development of efficient crawlers that address the related technical challenges. Using the collected datasets we present characterization studies of Facebook and Last.fm. Finally we present the first study to characterize the statistical properties of OSN applications and propose a method to model the application installation process.

 
AdviserAthina Markopoulou
SchoolUNIVERSITY OF CALIFORNIA, IRVINE
SourceDAI/B 72-01, p. , Dec 2010
Source TypeDissertation
SubjectsComputer engineering; Information science; Computer science
Publication Number3432219
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3432219
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.