Querying graph databases
by Tian, Yuanyuan, Ph.D., UNIVERSITY OF MICHIGAN, 2008, 151 pages; 3343235

Abstract:

Real life data can often be modeled as graphs, in which nodes represent objects and edges indicate their relationships. Large graph datasets are common in many emerging applications. To fully exploit the wealth of information encoded in graphs, systems for managing and analyzing graph data are critical.

To address the need of complex analysis on graph data, this thesis presents a graph querying toolkit, called Periscope/GQ. This toolkit is built on top of a commodity RDBMS. It provides a uniform schema for storing graphs and supports various graph query operations. Users can easily combine several operations to perform complex analysis on graphs.

The key feature of Periscope/GQ is the support of various sophisticated graph query operations besides the simple ones like node/edge selection and path search. In particular, this thesis focuses on two classes of sophisticated queries: graph matching and graph summarization.

The database community has largely focus on exact graph matching problems. However, due to the noisy and incomplete nature of real graph datasets, approximate, rather than exact graph matching is required. This thesis presents a novel approximate graph matching technique, called SAGA. SAGA employs a flexible graph similarity model and utilizes an index-based matching algorithm to efficiently evaluate matching queries.

SAGA is effective and efficient for small query graphs (with tens of nodes and edges), but is expensive when applied to large query graphs (with hundreds to thousands of nodes and edges). To handle large query graphs, TALE is proposed. TALE employs a novel indexing technique, which achieves high pruning power and scales linearly with the database sizes. The matching algorithm utilizes the index to first match the important nodes in the query, and then extends them to produce large graph matches.

Graph summarization techniques are useful for understanding underlying characteristics of graphs. To summarize large graphs, this thesis introduces an aggregation method. This method produces summary graphs by grouping nodes based on user-selected node attributes and relationships. It further allows users to control the resolutions of summaries, and provides the “drill-down” and “roll-up” abilities to navigate through summaries with different resolutions.

 
AdviserJignesh M. Patel
SchoolUNIVERSITY OF MICHIGAN
SourceDAI/B 70-01, p. , Mar 2009
Source TypeDissertation
SubjectsComputer science
Publication Number3343235
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3343235
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.