UMI  
ProQuest® Dissertations & Theses
The world's most comprehensive collection of dissertations and theses. Learn more...
ProQuest  
 
 
Analysis and automatic recognition of tones in Mandarin Chinese
by Surendran, Dinoj Randal, Ph.D., THE UNIVERSITY OF CHICAGO, 2007, 105 pages; 3287100
 

Abstract:

In tonal languages such as Mandarin Chinese, words are defined by their phonemic sequence and by the intonational patterns (tones) of their syllables.

To see if the problem of tone recognition is worth solving, we propose an information theoretic measure to compare the relative importance (Functional Load) of phonological contrasts in any language. Empirical calculations show that tones are at least as important as vowels for conveying information in Mandarin.

We then carry out a large and thorough investigation of possible acoustic features to recognize tones. This involves hundreds of experiments, each involves classifying over a hundred thousand syllables from ten hours of broadcast news speech.

After determining a base set of features (based on pitch, duration, and overall intensity) that achieve a syllable classification rate of 58.9.

Experiments on a subset of our data show that simple features based on energy in various frequency bands work better for tone recognition than those based on more complicated methods like harmonic-amplitude differences and glottal flow estimation. Further experiments determine a set of band energy features that improve classification accuracy to 63.7%, with the F score for Neutral Tone increasing from 0.345 to 0.619. This opens up a host of new features for future speech researchers in industry and academia to investigate and use.

We investigate making additional use of context: if we know the tones of the surrounding syllables, we can only increase classification accuracy to 67.2%. (This provides a useful upper bound for our experiments.) While we do not have such ideal contextual information, we can use estimates of it to increase accuracy to 65.0%.

Finally, we investigate the hypothesis that better articulated syllables are easier to recognize. On a small corpus of lab speech from Xu (1999), we classify syllables in focussed words with over 99% accuracy, and use this to improve classification accuracy of all syllables. However, in news broadcast speech, we find that while stronger syllables are recognized better, the difference is not enough to suggest an algorithm that makes use of it.

 
Advisor: Levow, Gina-Anne
School: THE UNIVERSITY OF CHICAGO
Source: DAI-B 68/10, p. , Apr 2008
Source Type: Ph.D.
Subjects: Linguistics; Computer science
Publication Number: 3287100
     
Adobe PDF Access the complete dissertation:
 

» This is an open access dissertation.
  Use the link below to access the full text PDF of this graduate work:
  http://gradworks.umi.com/3287100.pdf
  Use the link below to search and retrieve all open access dissertations:
  http://pqdtopen.proquest.com

 
 
 

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.il.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.



Copyright © 2007 ProQuest. All rights reserved. Terms and Conditions

ProQuest