UMI  
ProQuest® Dissertations & Theses
The world's most comprehensive collection of dissertations and theses. Learn more...
ProQuest  
 
 
Mining hidden associations in text corpora through concept chain and graph queries
by Jin, Wei, Ph.D., STATE UNIVERSITY OF NEW YORK AT BUFFALO, 2008, 127 pages; 3320467
 

Abstract:

The availability of large volumes of text documents has created the potential to discover valuable information hidden in those texts. This in turn has created the need for automated methods of discovering such information without having to read it all. The main theme of this dissertation is based on the hypotheses that the whole (document collection) is greater than the sum of its parts (individual documents). Interesting links and hidden information that connect facts, propositions or hypotheses can be found by using novel text mining techniques along with traditional data mining techniques. We refer to this research area as unapparent information revelation (UIR).

The goal of this dissertation is to automate techniques that will sift through these extensive document collections and find such links. Previous work in our UIR group has defined Concept Chain Queries (CCQ) and Concept Graph Queries (CGQ), special cases of text mining in document collections focusing on detecting links between two or more concepts across text documents. A concept chain query involving concepts A and B has the following meaning: find the most plausible relationships between concept A and concept B assuming that one or more instances of both concepts occur in the corpus, but not necessarily in the same document. Different from traditional search, CCQ is interpreted as finding the best concept chain and evidence trail across multiple documents that connect two concepts. CCQ can be extended to CGQ where three or more concepts are involved.

In this dissertation, the UIR problem is approached from various perspectives. I adapt the traditional bag-of-words approach, the existing Association Rule Mining method and the Local Context Analysis technique to address this problem. Specifically, I have shown that it is possible to improve knowledge discovery in document collections through combining text retrieval and link analysis techniques. Additionally, an explanation of the retrieved chain (graph), in terms of a cross-document evidence trail, is also generated for further investigation. The latter is a special case of a cross document summary.

Experiments on different data sets are presented that demonstrate the effectiveness of the new algorithm.

 
Advisor: Srihari, Rohini K.
School: STATE UNIVERSITY OF NEW YORK AT BUFFALO
Source: DAI-B 69/08, p. , Feb 2009
Source Type: Ph.D.
Subjects: Computer science
Publication Number: 3320467
     
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3320467
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

 
 
 

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.il.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.



Copyright © 2007 ProQuest. All rights reserved. Terms and Conditions

ProQuest