Stochastic graph grammars: Parameter learning and applications
by Mukherjee, Sourav, Ph.D., UNIVERSITY OF MARYLAND, BALTIMORE COUNTY, 2010, 126 pages; 3422843

Abstract:

Although traditional data mining algorithms often assume the data to have a feature-vector representation, emerging domains for data mining such as bioinformatics and social networks require the underlying data to be represented as graphs. In this thesis, we present stochastic graph grammars as probability models suitable for mining graph databases.

We first consider the problem of parameter learning of stochastic graph grammars from training data, and observe that the only known solution to this problem, PEGG (Parameter Estimation for Graph Grammars), relies on the bottom up parsing of the training instances. Parsing can be prohibitively expensive for many real grammars. To overcome this difficulty, we present the QuickPEGG algorithm for learning the grammar parameters more efficiently. QuickPEGG utilizes the fact that for most grammars, the probability distributions of the number of nodes and the number of edges of the underlying graph distribution contain sufficient information to infer the grammar parameters, thus rendering parsing unnecessary for parameter learning.

We then consider applications of the learned grammar to knowledge discovery in the underlying domain. In the existing literature, most applications of graph grammars are dominated by sampling and parsing. However, we show that once the stochastic graph grammar has been learned, interesting statistics such as the degree distribution may be inferred from it. Because of this property, stochastic graph grammars may be used as generative models for useful families of networks, such as scale-free networks where the degree of a node follows a power law distribution.

In summary, our work establishes stochastic graph grammars as a framework suitable for mining graph databases; the algorithms presented in this thesis improve our ability to efficiently learn grammar parameters from training data. The applications shown in this thesis enable us to obtain a deeper understanding of the underlying graph distribution, once the grammar has been learned.

 
AdviserTim Oates
SchoolUNIVERSITY OF MARYLAND, BALTIMORE COUNTY
SourceDAI/B 71-10, p. , Oct 2010
Source TypeDissertation
SubjectsComputer science
Publication Number3422843
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3422843
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.