Beyond nouns and verbs
by Gupta, Abhinav, Ph.D., UNIVERSITY OF MARYLAND, COLLEGE PARK, 2009, 141 pages; 3372850

Abstract:

During the past decade, computer vision research has focused on constructing image based appearance models of objects and action classes using large databases of examples (positive and negative) and machine learning to construct models. Visual inference however involves not only detecting and recognizing objects and actions but also extracting rich relationships between objects and actions to form storylines or plots. These relationships also improve recognition performance of appearance-based models. Instead of identifying individual objects and actions in isolation, such systems improve recognition rates by augmenting appearance based models with contextual models based on object-object, action-action and object-action relationships. In this thesis, we look at the problem of using contextual information for recognition from three different perspectives: (a) Representation of Contextual Models; (b) Role of language in learning semantic/contextual models; (c) Learning of contextual models from weakly labeled data.

Our work departs from the traditional view of visual and contextual learning where individual detectors and relationships are learned separately. Our work focuses on simultaneous learning of visual appearance and contextual models from richly annotated, weakly labeled datasets. Specifically, we show how rich annotations can be utilized to constrain the learning of visually grounded models of nouns, prepositions and comparative adjectives from weakly labeled data. I will also show how visually grounded models of prepositions and comparative adjectives can be utilized as contextual models for scene analysis. We also present storyline models for interpretation of videos. Storyline models go beyond pair-wise contextual models and represent higher order constraints by allowing only specific possible action sequences (stories). Visual inference using storyline models involve inferring the “plot” of the video (sequence of actions) and recognizing individual activities in the plot.

 
AdviserLarry Davis
SchoolUNIVERSITY OF MARYLAND, COLLEGE PARK
SourceDAI/B 70-09, p. , Nov 2009
Source TypeDissertation
SubjectsArtificial intelligence; Computer science
Publication Number3372850
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3372850
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.