The wide spread of devices capable of capturing multimedia content in the from of images and videos have created large volumes of information that cannot be indexed with traditional text based methods. The need for efficient representation, indexing, searching, and browsing of multimedia content have stimulated research on content based multimedia retrieval (CBMR) systems. CBMR systems represent multimedia objects based on their low-level visual and audio content rather then describing them with text labels. The main limitations of existing representation methods include the high dimensionality of the feature descriptors and their inability to represent objects with nonhomogeneous content. To address these limitations, we propose a new approach to represent multimedia objects. In particular, tins dissertation includes two main contributions. The first one consists of a novel approach to extract and represent features in terms of Dominant Descriptors (DD). The second contribution consists of a new approach to embed relational data (generated by the DD) in an Euclidean space.
Our DD approach is generic and can be used to represent color, texture, and audio content. It is a data driven approach that represents multimedia objects in a compact and intuitive form as signatures with dominant components. Unlike traditional methods that describe content by using simple statistics regardless of the homogeneity of the object, the DD approach identifies the optimal number of components to describe each object. Homogeneous objects would be represented by feature descriptors with few components while nonhomogeneous objects would require a larger number of components. For the case of Dominant Texture Descriptors, we generalize our representation to include spatial constraints. We show that our approach can be used to generalized some of the widely used methods for texture and audio representation adopted in the MPEG-7 standard.
The description of content in terms of DD does not provide feature vectors. It only allows pairwise comparison of multimedia objects. That is, it generates relational data. However, to take advantage of robust machine learning methods, and efficient indexing methods developed for object representation, dominant features need to be mapped to a vector space. The second main contribution of this dissertation addresses this issue and proposes a new method for mapping dominant descriptors into the Euclidean space. The proposed Euclidean embedding method is generic, data driven, and generates feature vectors that are interpretable and normalized. Our method is based on clustering of relational data to identify regions of feature space that group similar samples of data. Then, for each cluster, few anchor points are selected to generate cluster memberships that are used to map data. We propose a strategy for anchor point selection that ensures that the mapped features preserve the ordinal structure of the relational space. Using crisp, fuzzy, or possibilistic memberships, the quality of created embedding can be controlled to accommodate the application requirements.
To validate the proposed feature representation and embedding, we developed a Content Based Video Retrieval (CBVR) prototype system called DOFER (Dominant Feature Retrieval). DOFER has capabilities to create and index video collections based on visual and audio content. Our prototype incorporates existing methods for multimedia content representation, as well as, the new methods proposed in this dissertation. This allows us to make performance comparison of different methods on content retrieval tasks. Using large collections of videos, we present experimental results to demonstrate that the proposed methods can improve the quality of a CBMR system compared to standard methods.