UMI  
ProQuest® Dissertations & Theses
The world's most comprehensive collection of dissertations and theses. Learn more...
ProQuest  
 
 
Diffusion in computer science and statistics
by Narayanan, Hariharan, Ph.D., THE UNIVERSITY OF CHICAGO, 2009, 220 pages; 3369375
 

Abstract:

In this thesis, we investigate diffusion as an algorithmic and analytic tool in statistics and computer science.

We address a question arising from computational linguistics, where we wish to understand the behavior of a network of agents modeled as nodes of a graph that adaptively modify their lexicon using data from their neighbors. By introducing a model of memory and a family of coalescing random walks, we prove that they eventually reach a consensus with probability 1.

We study distributed averaging on graphs and devise a distributed algorithm that is based on a diffusion process having two time scales.

Addressing the question of routing in a network, we use steady-state diffusions corresponding to electrical flow in a network of resistors for oblivious routing and prove that this scheme performs well under a variety of performance measures.

Based on a microscopic view of diffusion as an ensemble of particles executing independent Brownian motions, we develop the fastest currently known algorithm for computing the area of the boundary of a convex set. A similar technique is used to produce samplers for the boundaries of convex sets and smooth hypersurfaces that are the boundaries of open sets in Rn, assuming access to samplers for the interior. These algorithms are motivated by Goodness-of-Fit tests in statistics.

The halfplane capacity, a quantity often used to parameterize stochastic processes arising in statistical physics, known as Schramm-Loewner evolutions, is shown to be comparable to a more geometric notion.

We analyze a class of natural random walks on a Riemannian manifold, and give bounds on the mixing times in terms of the Cheeger constant and a notion of smoothness that relates the random walk to the metric underlying the manifold.

A Markov chain having a stationary distribution that is uniform on the interior of a polytope is developed. This is the first chain whose mixing time is strongly polynomial when initiated in the vicinity of the center of mass. This Markov chain can be interpreted as a random walk on a certain Riemannian manifold. The resulting algorithm for sampling polytopes outperforms known algorithms when the number of constraints is of the same order of magnitude as the dimension. We use a variant of this Markov chain to design a randomized version of Dikin's affine scaling algorithm for linear programming. We provide polynomial-time guarantees which do not exist for Dikin's algorithm.

Addressing a question from machine learning, under certain smoothness conditions, we prove that a form of weighted surface area is the limit of the weight of graph cuts in a family of random graphs arising in the context of clustering. This is done by relating both to the amount of diffusion across the surface in question.

Addressing a related issue on manifolds, we obtain an upper bound on the annealed entropy of the collection of open subsets of a manifold whose boundaries are well-conditioned. This result leads to an upper bound on the number of random samples needed before it is possible to accurately classify data lying on a manifold.

 
Advisor: Niyogi, Partha
School: THE UNIVERSITY OF CHICAGO
Source: DAI-B 70/08, p. , Feb 2010
Source Type: Ph.D.
Subjects: Statistics; Computer science
Publication Number: 3369375
     
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3369375
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

 
 
 

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.il.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.



Copyright © 2007 ProQuest. All rights reserved. Terms and Conditions

ProQuest