Rapid Speaker Normalization and Adaptation with Applications to Automatic Evaluation of Children's Language Learning Skills
by Wang, Shizhen, Ph.D., UNIVERSITY OF CALIFORNIA, LOS ANGELES, 2010, 118 pages; 3431882

Abstract:

This dissertation investigates speaker variation issues in automatic speech recognition (ASR), with a focus on rapid speaker normalization and adaptation methods using limited enrollment data from the speaker. Investigations are carried out in the direction of reducing spectral variations through frequency warping.

Two methods are developed, one based on the supraglottal (vocal tract) resonances (formants), and the other on resonances from subglottal airways. The first method attempts to reshape (warp) the spectrum by aligning corresponding formant peaks. Since there are various levels of variations in formant structures, regression-tree based phoneme- and state-level spectral peak alignment is studied for rapid speaker adaptation using linearization of the vocal tract length normalization (VTLN) technique. This method is investigated in a maximum likelihood linear regression (MLLR)-like framework, taking advantage of both the efficiency of frequency warping (VTLN) and the reliability of statistical estimations (MLLR). Two different regression classes are investigated: one based on phonetic classes (using combined knowledge and data-driven techniques) and the other based on Gaussian mixture classes.

The second approach utilizes subglottal resonances, which has been shown to affect spectral properties of speech sounds. A reliable algorithm is developed to automatically estimate the second subglottal resonance (Sg2) from speech signals. The algorithm is calibrated on children's speech data with simultaneous accelerometer recordings from which Sg2 frequencies can be directly measured. A cross-language study with bilingual Spanish-English children is performed to investigate whether Sg2 frequencies are independent of speech content and language. The study verifies that Sg2 is approximately constant for a given speaker and thus can be a good candidate for limited data speaker normalization and cross-language adaptation. A speaker normalization method is then presented using Sg2.

As an application, ASR techniques are applied to automatically evaluate children's phonemic awareness through three blending tasks (phoneme blending, onset-rhyme blending and syllable blending). The system incorporates speaker normalization, disfluency detection and Spanish accent detection, together with speech recognition to assess the overall quality of children's speech productions.

 
AdviserAbeer Alwan
SchoolUNIVERSITY OF CALIFORNIA, LOS ANGELES
SourceDAI/B 71-12, p. , Dec 2010
Source TypeDissertation
SubjectsElectrical engineering; Artificial intelligence; Computer science
Publication Number3431882
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3431882
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.