VastMM-Tag: Semantic Indexing and Browsing of Videos for E-Learning
by Morris, Mitchell J., Ph.D., COLUMBIA UNIVERSITY, 2012, 124 pages; 3494994

Abstract:

Quickly accessing the contents of a video is challenging for users, particularly for unstructured video, which contains no intentional shot boundaries, no chapters, and no apparent edited format. We approach this problem in the domain of lecture videos though the use of machine learning, to gather semantic information about the videos; and through user interface design, to enable users to fully utilize this new information.

First, we use machine learning techniques to gather the semantic information. We develop a system for rapid automatic semantic tagging using a heuristic-based feature selection algorithm called Sort-Merge, by using large initial heterogeneous low-level feature sets (cardinality greater than 1K).

We explore applying Sort-Merge to heterogeneous feature sets though two methods: early fusion and late fusion. Each takes different approaches to handling the different kinds of features in the heterogeneous set. We determine the most predictive feature sets for key-frame filters such as “has text”, “has computer source code”, or “has instructor motion”. Specifically we explore the usefulness of Harr Wavelets, Fast Fourier Transforms, Color Coherence Vectors, Line Detectors, Ink Features and Pan/Tilt/Zoom detectors. For evaluation, we introduce a “keeper” heuristic for feature sets, which provides a method of performance comparison against a baseline.

Second, we create a user interface to allow the user to make use of the semantic tags we gathered though our computer vision and machine learning process. The interface is integrated into an existing video browser, which detected shot-like boundaries and presented a multitimeline view. The content within shot-like boundaries is represented by frames to which our new interface applies the generated semantic tags. Specifically, we make accessible the semantic concepts of 'text', 'code', 'presenter', and 'person motion.' The tags are detected in the simulated shots using the filters generated with our machine learning approach and are displayed to users using a user-customizable multi-timeline view. We also generate tags based on ASRgenerated transcripts that have been limited to the words provided in the index of the course text book. Each of these occurrences is aligned with the simulated shots. Each spoken word becomes a tag analogous to the visual concepts. A full Boolean algebra over the tags is provided to enable new composite tags such as 'text or code, but no presenter.'

Finally, we quantify the effectiveness of our features and our browser through user studies, both observational and task driven. We find that users that use the full suite of tools performed a search task in 60% of the time of users without access to tags. We find that when users are asked to perform search tasks they follow a nearly fixed pattern of accesses, alternating between the use of tags and Keyframes, or between the use of Word Bubbles and the media player. Based on user behavior and feedback, we redesigned the interface to group spatially interface components that are used together, removed un-used components, and redesigned the display of Word Bubbles to match that of the Visual Tags. We found that users strongly preferred the Keyframe tool, as well as both kinds of tags. Users also either found the algebra very useful or not useful at all.

 
AdviserJohn R. Kender
SchoolCOLUMBIA UNIVERSITY
SourceDAI/B 73-06, p. , Feb 2012
Source TypeDissertation
SubjectsEducational technology; Computer science
Publication Number3494994
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3494994
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.