Similarity Search with Multimodal Data
by Wang, Zhe, Ph.D., PRINCETON UNIVERSITY, 2012, 132 pages; 3500000

Abstract:

Similarity search systems are designed to help people to organize multimedia non-text data and find valuable information. The multimedia data intrinsically has multiple modalities (e.g., visual and audio features from video clips) which can be exploited to construct better search systems. Traditionally, various integration techniques have been used to aggregate multiple modalities. However, such algorithms do not scale well for large datasets. As the multimedia data grows, it is a challenge to build a search system to handle large-scale multimodal data efficiently and provide users with information they need.

The goal of this dissertation is to study how to effectively combine multiple modalities to implement similarity search systems for large datasets. I have carried out my study through three similarity search systems each designed for different application. Each system combines multiple modalities to help users find desired information quickly. With VFerret system, I studied how to combine visual features with audio features for effective personal video search. With Image Spam Detection System, I explored several aggregation methods to integrate multiple image spam filters to detect image spams. With my Product Navigation System, I studied how to combine text search with image similarity search to help user find desired products. This thesis has also studied a rank-based model which helps system designers to construct more efficient large-scale multimodal similarity search systems.

Although the general solution to using multimodal data in a similarity search system is still unknown, this dissertation shows that it is possible to substantially improve search accuracy and efficiency by leveraging domain specific knowledge of multimodal data. The VFerret system improves search accuracy from an average precision of 0.66 to 0.79 by combining visual and audio features. The Image Spam Detection System significantly lowers the false positive rate from a previous result of 1% to 0.001% while maintaining comparable detection rates by combining multiple image filters intelligently. My Product Navigation System reduces number of user clicks by 60% compared to traditional systems through a new method of combining text search with image similarity search. These results support further adoption and study of multimodal data in similarity search system designs.

 
AdviserKai Li
SchoolPRINCETON UNIVERSITY
SourceDAI/B 73-07(E), p. , Apr 2012
Source TypeDissertation
SubjectsComputer engineering; Computer science
Publication Number3500000
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3500000
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.