Beyond keywords: finding information more accurately and easily using natural language
by Lease, Matthew, Ph.D., BROWN UNIVERSITY, 2010, 118 pages; 3430102

Abstract:

Information retrieval (IR) has become a ubiquitous technology for quickly and easily finding information on a given topic amidst the wealth of digital content available today. This dissertation addresses search for written and spoken natural language documents, including news articles, Web pages, and spoken interviews. Effective model estimation is identified as a key problem, and several novel estimation techniques are presented and shown to significantly enhance search accuracy.

While search is typically performed via a few carefully chosen keywords, formulating effective keyword queries is often unintuitive and iterative, particularly when seeking complex information. As an alternative to keyword search, this dissertation investigates search using “natural” queries, such as questions or sentences a person might naturally articulate in communicating their information need to another person. By moving toward supporting natural queries, the communication burden is shifted from user query formulation to system interpretation of natural language. The challenge in enacting such a shift is enabling automatic IR systems to more effectively cope with natural language. To this end, several new estimation techniques for modeling natural queries are described. In comparison to a maximum likelihood baseline, 15-20% relative improvement in mean-average precision (MAP) is demonstrated without use of query expansion.

When an IR system discovers or is provided one or more feedback documents exemplifying a user’s information need, there is further opportunity to improve search accuracy by exploiting document contents for query expansion. However, since documents typically discuss multiple topics varying in importance and relevance to any information need, the system must again be able to effectively interpret verbose natural language. Consequently, an estimation method for leveraging such documents is presented and shown to yield state-of-the-art search accuracy. Depending on the base model employed, 15-85% relative MAP improvement is achieved.

When modeling higher-order lexical features or searching smaller document collections like cultural history archives, sparsity become particularly problematic for estimation. To cope with such sparsity, additional estimation methods are described which yield 5-20% relative improvement in MAP accuracy across varying conditions of query verbosity.

 
AdviserEugene Charniak
SchoolBROWN UNIVERSITY
SourceDAI/B 71-11, p. , Nov 2010
Source TypeDissertation
SubjectsInformation technology; Information science; Artificial intelligence; Computer science
Publication Number3430102
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3430102
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.