Monthly Archives: November 2019

Native language influence on brass instrument performance

Matthias Heyne, myself, and Jalal Al-Tamimi recently published Native language influence on brass instrument performance: An application of generalized additive mixed models (GAMMs) to midsagittal ultrasound images of the tongue. The paper contains the bulk of the results form Matthias’ PhD Dissertation. The study is huge, with ultrasound tongue recordings of 10 New Zealand English (NZE) and 10 Tongan trombone players. There are 12,256 individual tongue contours of vowel tokens (7,834 for NZE, 4,422 for Tongan) and 7,428
515 tongue contours of sustained note production (3,715 for NZE, 3,713 for Tongan).

The results show that native language influences tongue position during Trombone note production. This includes tongue position and note variability. The results also support Dispersion Theory (Liljencrants and Lindblom 1972; Lindblom, 1986; Al-Tamimi and Ferragne,
832 2005) in that vowel production is more variable in Tongan, which has few vowels, then in NZE, which has many.

The results also show that note production at the back of the tongue maps to low-back vowel production (schwa and ‘lot’ for NZE, /o/ and /u/ for schwa). These two result sets support an analysis of local optimization with semi-independent tongue regions (Ganesh et al., 2010, Loeb, 2012).

The results do not, however, support the traditional brass pedagogy hypothesis that higher notes are played with a closer (higher) tongue position. However, Matthias is currently working with MRI data that *does* support the brass pedagogy hypothesis, and that we might not have seen this because of the ultrasound transducer stabilization system needed to keep the ultrasound probe aligned to the participant’s head.

Liljencrants, Johan, and Björn Lindblom. 1972. “Numerical Simulation of Vowel Quality Systems:
The Role of Perceptual Contrast.” Language, 839–62.

Lindblom, Björn. 1963. Spectrographic study of vowel reduction. The Journal of the Acoustical
Society of America 35(11): 1773–1781.

Al-Tamimi, J., and Ferragne, E. 2005. “Does vowel space size depend on language vowel inventories? Evidence from two Arabic dialects and French,” in Proceedings of the Ninth European Conference on Speech Communication and Technology, Lisbon, 2465–2468.

Ganesh, Gowrishankar, Masahiko Haruno, Mitsuo Kawato, and Etienne Burdet. 2010. “Motor
Memory and Local Minimization of Error and Effort, Not Global Optimization, Determine
Motor Behavior.” Journal of Neurophysiology 104 (1): 382–90.

Loeb, Gerald E. 2012. “Optimal Isn’t Good Enough.” Biological Cybernetics 106 (11–12): 757–65.

Tri-modal speech: Audio-visual-tactile integration in speech perception

Myself, Doreen Hansmann, and Catherine Theys just published our article on “Tri-modal Speech: Audio-visual-tactile Integration in Speech Perception” in the Journal of the Acoustical Society of America. This paper was also presented as a poster at the American Speech-Language-Hearing Association (ASHA) Annual Convention in Orlando, Florida, November 21-22, 2019, winning a meritorious poster award.

TL-DR; People use auditory, visual, and tactile speech information to accurately identify syllables in noise. Auditory speech information is the most important, then visual information, and lastly aero-tactile information – but we can use them all at once.

Abstract: Speech perception is a multi-sensory experience. Visual information enhances (Sumby and Pollack, 1954) and interferes (McGurk and MacDonald, 1976) with speech perception. Similarly, tactile information, transmitted by puffs of air arriving at the skin and aligned with speech audio, alters (Gick and Derrick, 2009) auditory speech perception in noise. It has also been shown that aero-tactile information influences visual speech perception when an auditory signal is absent (Derrick, Bicevskis, and Gick, 2019a). However, researchers have not yet identified the combined influence of aero-tactile, visual, and auditory information on speech perception. The effects of matching and mismatching visual and tactile speech on two-way forced-choice auditory syllable-in-noise classification tasks were tested. The results showed that both visual and tactile information altered the signal-to-noise threshold for accurate identification of auditory signals. Similar to previous studies, the visual component has a strong influence on auditory syllable-in-noise identification, as evidenced by a 28.04 dB improvement in SNR between matching and mismatching visual stimulus presentations. In comparison, the tactile component had a small influence resulting in a 1.58 dB SNR match-mismatch range. The effects of both the audio and tactile information were shown to be additive.

Derrick, D., Bicevskis, K., and Gick, B. (2019a). “Visual-tactile speech perception and the autism quotient,” Frontiers in Communication – Language Sciences 3(61), 1–11, doi:

Gick, B., and Derrick, D. (2009). “Aero-tactile integration in speech perception,” Nature 462, 502–504, doi:

McGurk, H., and MacDonald, J. (1976). “Hearing lips and seeing voices,” Nature 264, 746–748, doi:

Calculating an Erdös-Chomsky-Bacon number – 13

Some days it is hard to focus on work – any day where I have to look at large-scale copy-edits is one of them. So I decided to procrastinate by calculating my Erdös-Chomsky-Bacon number (modified), which is any publication links across co-authors to Paul Erdös and Noam Chomsky, as well as any filmed acting across actors to Kevin Bacon. That last part is a cheat because a Bacon number is supposed to be movie-only connections, but I’m OK with that because I was paid to do the acting.

My Erdös-Chomsky-Bacon number is 13:

Erdös Number = 4

Donald Derrick -> Daniel Archambault
Derrick, Donald and Archambault, Daniel Treeform: Explaining and exploring grammar through syntax trees. Literary and Linguistic Computing, (2010). 25(1):53–66.

Daniel Archambault -> David G. Kirkpatrick
Archambault, Daniel; Evans, Willam; Kirkpatrick, David Computing the set of all the distant horizons of a terrain. Internat. J. Comput. Geom. Appl. 15 (2005), no. 6, 547–563.

David G. Kirkpatrick -> Pavol Hell
Kirkpatrick, D. G.; Hell, P. On the complexity of general graph factor problems. SIAM J. Comput. 12 (1983), no. 3, 601–609.

Pavol Hell -> Paul Erdős
Erdös, P.; Hell, P.; Winkler, P. Bandwidth versus bandsize. Graph theory in memory of G. A. Dirac (Sandbjerg, 1985), 117–129, Ann. Discrete Math., 41, North-Holland, Amsterdam, 1989.

Chomsky number = 5

Donald Derrick -> Michael I. Proctor
Examining speech production using masked priming.
Chris Davis, Jason A. Shaw, Michael I. Proctor, Donald Derrick, Stacey Sherwood, Jeesun Kim
Proceedings of the 18th International Congress of Phonetic Sciences, 2015

Michael I. Proctor -> Louis Goldstein
Analysis of speech production real-time MRI.
Vikram Ramanarayanan, Sam Tilsen, Michael I. Proctor, Johannes Töger, Louis Goldstein, Krishna S. Nayak, Shrikanth Narayanan
Computer Speech & Language, 2018

Lousi Goldstein -> Srikantan S. Nagarajan
A New Model of Speech Motor Control Based on Task Dynamics and State Feedback.
Vikram Ramanarayanan, Benjamin Parrell, Louis Goldstein, Srikantan S. Nagarajan, John F. Houde
Proceedings of the Interspeech 2016, 2016

Srikantan S. Nagarajan -> David Poeppel
Asymptotic SNR of scalar and vector minimum-variance beamformers for neuromagnetic source reconstruction. (DOI)
Kensuke Sekihara, Srikantan S. Nagarajan, David Poeppel, Alec Marantz
IEEE Trans. Biomed. Engineering, 2004

David Poeppel -> Noam Chomsky
Governing Board Symposium The Biology of Language in the 21st Century. (DOI)
Noam Chomsky, David Poeppel, Patricia Churchland, Elissa L. Newport
Proceedings of the 33th Annual Meeting of the Cognitive Science Society, 2011

Bacon number = 4

Donald Derrick -> Earl Quewezance
“Frontrunners” by European News at the “Play the Game” conference (2005)

Earl Quewezance -> Rob Morrow
“The Mommy’s Curse”, episode 6, Northern Exposure (1995)

Rob Morrow -> Embeth Davdtz
Emperor’s Club (2002)

Embeth Davidtz -> Kevin Bacon
Murder in the First (1995)