With the fast booming of online music repositories, there are increasing needs for content-based Automatic Indexing to help users find their favorite music objects. Music instrument recognition is one of the main subtasks. Recently, numerous successful approaches on musical data feature extraction and selection have been proposed for instrument recognition in monophonic sounds. Unfortunately, none of those algorithms can be successfully applied to polyphonic sounds. Thus, identification of music instruments in polyphonic sounds is still difficult and challenging, especially when harmonic partials are overlapping with each other. This has stimulated the research on music sound separation and new features development for content-based automatic music information retrieval. Based on recent research results in sound classification of monophonic sounds and studies in speech recognition, Moving Picture Experts Group (MPEG) standardized a set of features of the digital audio content data for the purpose of interpretation of the information meaning for audio signal. Most of them are in a form of large matrix or a vector of large size, which are not suitable for traditional data mining algorithms; while other features in a smaller size are not sufficient for instrument recognition in polyphonic sounds. Therefore, these acoustical features themselves alone cannot be successfully applied to classification of polyphonic sounds. However, these features contain critical information, which implies music instruments' signatures.
The ultimate goal of this thesis is to build a flexible query answering system, for a musical database, retrieving from it all objects satisfying queries like "find all musical pieces in pentatonic scale with a viola and piano where viola is playing for minimum 20 seconds and piano for minimum 10 seconds". To achieve that, first of all a database of sounds containing musical instruments allowed in queries has to be built. This database is already built as a part of the music information retrieval system, called MIRAI, and it already contains about 4000 sounds taken from the MUMs (McGill University Master Samples). The descriptions of these sounds are in terms of standard musical features which definitions are provided by MPEG7, in terms of other features used earlier in a similar research, and new features proposed in this thesis. All these features are implemented and tested for their correctness. The database of musical sounds is used as a vehicle to construct several classifiers for automatic instrument recognition. In this thesis we limit our investigations to classifiers provided by WEKA and RSES (Rough Sets Exploration System). Their performance is compared against the performance of similar classifiers constructed from the same database projected to MPEG7 type features only. The main problem facing this thesis is not only the construction of the proper and sufficient set of features needed to represent musical sounds which guarantees that their descriptions can differentiate them but also a mechanism of splitting multiple instruments played simultaneously in musical sounds. For checking the performance of classifiers 3-cross or/and 10-cross validation, and bootstrap procedures are used. The classifiers showing the best performance are adopted for automatic indexing of musical pieces by instruments. Each musical piece is seen as a segmented object in the musical database with segments showing when each relevant instrument starts and ends playing. This way the musical database can be represented as an FS-tree (Frame Segment Tree) structure. The query answering system should be seen as the interface to the FS-tree representation of the musical database. The flexibility of the query answering system is based on the hierarchical structure representing all musical instruments. When a query fails, it is generalized and checked for success by the query answering system. The construction of the Flexible Query Answering System requires building classifiers, for automatic indexing of music by instruments, based on the training database where instruments names are replaced by their generalizations.
Sound Separation is restricted to isolating the harmonic polyphonic sounds, where stable pitches are predictable. Previous research of different aspects on music instruments identification is reviewed by the order of sound processing.
The complete process of answering user queries will involve the following functionalities: segmenting a music piece into groups of frames, estimating a pre-dominant pitch and isolating/subtracting the sound by matching its harmonic features with the feature database, repeating the process until only noise remains, retrieving features from the resultant monophonic sounds, performing classification and storing the labels together with the music piece in a form of the FS tree into an indexed database.
This work has implications for research in blind harmonic sound separation, fundamental frequency estimation, timbre identification, pattern recognition, classification, music annotation, and collaborative query answering.