In American English, the liquid sounds /r/ and /l/ are the most articulatorily variable and complex sounds. They can be produced by several distinct types of tongue configurations and are the most troublesome sounds for children and nonnative English-speakers to learn. Better understanding of this many-to-one mapping between articulation and acoustics would be beneficial to other areas such as speech pathology, speaker verification, speech recognition and speech synthesis.
In this dissertation, two articulatory configurations for each liquid sound were studied (a "retroflex" /r/ vs. a "bunched" /r/, and a light /l/ vs. a dark /l/). Different from previous work on liquids, finite element analysis has been performed to obtain the acoustic responses of the three-dimensional (3-D) vocal tract models, which are based on volumetric magnetic resonance (MR) imaging. Area function models were derived based on the wave propagation property inside the vocal tract.
The retroflex /r/ and the bunched /r/ show similar patterns of F1-F3 but very different spacing between F4 and F5. The results from the formant acoustic sensitivity functions and simple-tube vocal tract models suggested that this F4/F5 difference can be explained largely by differences in whether the long cavity behind the palatal constriction acts as a half- or a quarter-wavelength resonator. For both the retroflex /r/ and the bunched /r/, F4 and F5 (along with F3 for the particular speakers studied in this research) come from the long back cavity. However, these formants are half wavelength resonances for the retroflex /r/, but quarter wavelength resonances for the bunched /r/.
While both the dark /l/ and the light /l/ have a linguo-alveolar contact and two lateral channels, they differ in the length of the linguo-alveolar contact and in the presence of the linguopalatal contacts caused by raising the sides of the tongue. Both have similar patterns in F1-F3, but differ in the number and locations of zeros in spectrum. For the dark /l/, only one zero occurs below 6 kHz and it is produced by the cross mode posterior to the linguo-alveolar contact. For the light /l/, three zeros below 6 kHz are produced by the asymmetrical channels, the supralingual cavity and the cross mode posterior to the linguo-alveolar contact. The results from two simple vocal tract models show that the lateral channels have to be asymmetrical with an effective length between 3-6 cm to get a zero in the region of F3-F5.
Based on the Buckeye database, the acoustic variability and discriminative power of liquids were studied with the mel-frequency band energy coefficients as acoustic parameter. Analysis of variance shows that the inter-speaker variability of /r/ is larger than any other phonemes except /sh/, /s/ and /zh/. On average, /r/ and /l/ have larger inter-speaker variability than any other broad phonetic class. The F-ratio averages of liquids are larger than glides, fricatives, affricates and stops, but smaller than nasals. The speaker identification experiments show that the ranking of the average discriminative power for liquids and other broad phonetic classes is: /r/ > Glides > /l/ > Affricates > Fricatives > Stops > Nasals > Vowels.