Katie Bicevskis, Bryan Gick, and I just had “Visual-tactile Speech Perception and the Autism Quotient” – our reexamination and expansion our evidence for ecologically valid visual-tactile speech perception – accepted to Frontiers in Communications: Language Sciences. Right now only the abstract and introductory parts are online, but the whole article will be up soon. The major contribution of this article is that speech perceivers integrate air flow information during visual speech perception with greater reliance upon event-related accuracy the more they self-describe as neurotypical. This behaviour supports the Happé & Frith (2006) weak coherence account of Autism Spectrum Disorder. Put very simply, neurotypical people perceive whole events, but people with ASD perceive uni-sensory parts of events, often with greater detail than their neurotypical counterparts. This account partially explains how autists can have deficiencies in imagination and social skills, but also be extremely capable in other areas of inquiry. Previous models of ASD offered an explanation of disability, Happé and Frith offer an explanation of different ability.
I will be expanding on this discussion, with a plain English explanation of the results, once the article is fully published. For now, the article abstract is re-posted here:
“Multisensory information is integrated asymmetrically in speech perception: An audio signal can follow video by 240 milliseconds, but can precede video by only 60 ms, without disrupting the sense of synchronicity (Munhall et al., 1996). Similarly, air flow can follow either audio (Gick et al., 2010) or video (Bicevskis et al., 2016) by a much larger margin than it can precede either while remaining perceptually synchronous. These asymmetric windows of integration have been attributed to the physical properties of the signals; light travels faster than sound (Munhall et al., 1996), and sound travels faster than air flow (Gick et al., 2010). Perceptual windows of integration narrow during development (Hillock-Dunn and Wallace, 2012), but remain wider among people with autism (Wallace and Stevenson, 2014). Here we show that, even among neurotypical adult perceivers, visual-tactile windows of integration are wider and flatter the higher the participant’s Autism Quotient (AQ) (Baron-Cohen et al, 2001), a self-report screening test for Autism Spectrum Disorder (ASD). As ‘pa’ is produced with a tiny burst of aspiration (Derrick et al., 2009), we applied light and inaudible air puffs to participants’ necks while they watched silent videos of a person saying ‘ba’ or ‘pa’, with puffs presented both synchronously and at varying degrees of asynchrony relative to the recorded plosive release burst, which itself is time-aligned to visible lip opening. All syllables seen along with cutaneous air puffs were more likely to be perceived as ‘pa’. Syllables were perceived as ‘pa’ most often when the air puff occurred 50-100 ms after lip opening, with decaying probability as asynchrony increased. Integration was less dependent on time-alignment the higher the participant’s AQ. Perceivers integrate event-relevant tactile information in visual speech perception with greater reliance upon event-related accuracy the more they self-describe as neurotypical, supporting the Happé & Frith (2006) weak coherence account of ASD.”
This work is therefore in part a follow-up to some of my co-authored research into biomechanical modelling of English /ɹ/ variants, indicating that vocalic context influences variation through muscle stress, strain, and displacement. It is, by these three measures, “easier” to move from an /i/ to a tip-down /ɹ/ , but from /a/ to a tip-up /ɹ/.
In this study, for speakers who vary at all (some only do tip-up or tip-down), they are most likely to produce tip-up /ɹ/ in these conditions:
back vowel > low central vowel > high front vowel
initial /ɹ/ > intervocalic /ɹ/ > following a coronal (“dr”) > following a velar (“cr”)
The results show that allophonic variation of NZE /ɹ/ is similar to that in American English, indicating that the variation is caused by similar constraints. The results support theories of locally optimized modular speech motor control, and a mechanical model of rhotic variation.
The abstract is repeated below, with links to articles contained within:
This paper investigates the articulation of approximant /ɹ/ in New Zealand English (NZE), and tests whether the patterns documented for rhotic varieties of English hold in a non- rhotic dialect. Midsagittal ultrasound data for 62 speakers producing 13 tokens of /ɹ/ in various phonetic environments were categorized according to the taxonomy by Delattre & Freeman (1968), and semi-automatically traced and quantified using the AAA software (Articulate Instruments Ltd. 2012) and a Modified Curvature Index (MCI; Dawson, Tiede & Whalen 2016). Twenty-five NZE speakers produced tip-down /ɹ/ exclusively, 12 tip-up /ɹ/ exclusively, and 25 produced both, partially depending on context. Those speakers who produced both variants used the most tip-down /ɹ/ in front vowel contexts, the most tip- up /ɹ/ in back vowel contexts, and varying rates in low central vowel contexts. The NZE speakers produced tip-up /ɹ/ most often in word-initial position, followed by intervocalic, then coronal, and least often in velar contexts. The results indicate that the allophonic variation patterns of /ɹ/ in NZE are similar to those of American English (Mielke, Baker & Archangeli 2010, 2016). We show that MCI values can be used to facilitate /ɹ/ gesture classification; linear mixed-effects models fit on the MCI values of manually categorized tongue contours show significant differences between all but two of Delattre & Freeman’s (1968) tongue types. Overall, the results support theories of modular speech motor control with articulation strategies evolving from local rather than global optimization processes, and a mechanical model of rhotic variation (see Stavness et al. 2012).
My name is Donald Derrick, and this web-site is dedicated to presenting my research on speech production and perception.
On the production side, I examine vocal tract motion (both shape and muscle position), air flow, oral and nasal acoustics, and visual face motion. I then use this production information to study audio, visual, and tactile speech perception. The purpose is to identify constraints on low level production, and low level percepts that can enhance or interfere with speech perception.
This research has helped identify constraints such as gravity, muscle elasticity, and end-state-comfort on speech production and brought in true multi-modal speech perception research by adding (aero)-tactile speech into audio-visual speech study. I have used this research to expand our understanding of the nature of speech perception, and have been working on commercialization of the use of air flow in enhancement of speech perception, as well as recording oral, nasal, and air flow outputs in speech without the use of masks or other stigmatizing measurement systems.
I am, as of 2017, working on a sonority scale for visual and tactile speech, as well as a both behavioral and brain research on audio-visual-tactile speech in coordination with the University of Canterbury’s Speech lab.
The end-goal is to form a true multi-sensory understanding of speech production and perception that does not ignore or minimize any of our senses.