The pitfalls of audio-visuo-tactile research

I am going to be submitting an article entitled “Tri-modal Speech: Audio-Visual-Tactile integration in Speech Perception”, along with my co-authors Doreen Hansmann and Catherine Theys, within the month. The article was, in the end, a success, demonstrating that visual and tactile speech can, separately and jointly, enhance or interfere with accurate auditory syllable identification in two-way forced-choice experiments.

However, I am writing this short post to serve as a warning to anyone who wishes to combine visual, tactile, and auditory speech perception research into one experiment. Today’s technology makes that exceedingly difficult:

The three of us have collective experience with electroencephalography, magnetic resonance imaging, and with combining ultrasound imaging of the tongue with electromagnetic articulometry. These are complex tasks that require a great deal of skill and training to complete successfully. Yet this paper’s research was the most technically demanding and error-prone task we have ever encountered. The reason is that despite all of the video you see online today, modern computers do not easily allow for research-grade, synchronized video within experimental software. Due to today’s multi-core central processing, it was in fact easier to do such things a 15 years ago than it is now. The number and variety of computer bugs in the operating system, video and audio library codecs, and experimental software presentation libraries were utterly overwhelming.

We programmed this experiment in PsychoPy2, and after several rewrites and switching between a number of visual and audio codecs, we were forced to abandon the platform entirely due to unfixable intermittent crashes, and switch to MatLab and PsychToolBox. PsychToolBox also had several issues, but with several days of system debugging effort by Johnathan Wiltshire, programmer analyst at the University of Canterbury’s psychology department, these issues were at least resolvable. We cannot thank Johnathan enough! In addition, electrical issues with our own air flow system made completion of this research a daunting task, requiring a lot of help and repairs from Scott Lloyd of Electrical engineering. Scott did a lot of burdensome work for us, and we are grateful.

All told, I alone lost almost 100 working days to debugging and repair efforts during this experiment. We therefore recommend all those who follow up on this research make sure that they have collaborators with backgrounds in both engineering and information technology, work in labs with technical support, and have budgets and people who can and will build electrically robust equipment. We also recommend not just testing, debugging, and piloting experiments, but also the generation of automated iterative tools that can identify and allow the resolution of uncommon intermittent errors.

Your mental health depends on you following this advice.

Visual-tactile Speech Perception and the Autism Quotient

Katie Bicevskis, Bryan Gick, and I just had “Visual-tactile Speech Perception and the Autism Quotient” – our reexamination and expansion our evidence for ecologically valid visual-tactile speech perception – accepted to Frontiers in Communications: Language Sciences.  Right now only the abstract and introductory parts are online, but the whole article will be up soon.  The major contribution of this article is  that speech perceivers integrate air flow information during visual speech perception with greater reliance upon event-related accuracy the more they self-describe as neurotypical.  This behaviour supports the Happé & Frith (2006) weak coherence account of Autism Spectrum Disorder.  Put very simply, neurotypical people perceive whole events, but people with ASD perceive uni-sensory parts of events, often with greater detail than their neurotypical counterparts.  This account partially explains how autists can have deficiencies in imagination and social skills, but also be extremely capable in other areas of inquiry.  Previous models of ASD offered an explanation of disability, Happé and Frith offer an explanation of different ability.

I will be expanding on this discussion, with a plain English explanation of the results, once the article is fully published.  For now, the article abstract is re-posted here:

“Multisensory information is integrated asymmetrically in speech perception: An audio signal can follow video by 240 milliseconds, but can precede video by only 60 ms, without disrupting the sense of synchronicity (Munhall et al., 1996). Similarly, air flow can follow either audio (Gick et al., 2010) or video (Bicevskis et al., 2016) by a much larger margin than it can precede either while remaining perceptually synchronous. These asymmetric windows of integration have been attributed to the physical properties of the signals; light travels faster than sound (Munhall et al., 1996), and sound travels faster than air flow (Gick et al., 2010). Perceptual windows of integration narrow during development (Hillock-Dunn and Wallace, 2012), but remain wider among people with autism (Wallace and Stevenson, 2014). Here we show that, even among neurotypical adult perceivers, visual-tactile windows of integration are wider and flatter the higher the participant’s Autism Quotient (AQ) (Baron-Cohen et al, 2001), a self-report screening test for Autism Spectrum Disorder (ASD). As ‘pa’ is produced with a tiny burst of aspiration (Derrick et al., 2009), we applied light and inaudible air puffs to participants’ necks while they watched silent videos of a person saying ‘ba’ or ‘pa’, with puffs presented both synchronously and at varying degrees of asynchrony relative to the recorded plosive release burst, which itself is time-aligned to visible lip opening. All syllables seen along with cutaneous air puffs were more likely to be perceived as ‘pa’. Syllables were perceived as ‘pa’ most often when the air puff occurred 50-100 ms after lip opening, with decaying probability as asynchrony increased. Integration was less dependent on time-alignment the higher the participant’s AQ. Perceivers integrate event-relevant tactile information in visual speech perception with greater reliance upon event-related accuracy the more they self-describe as neurotypical, supporting the Happé & Frith (2006) weak coherence account of ASD.”

Feldmann’s “Do Linguistic Structures Affect Human Capital?”: Rebuttal is better than suppression.

There is a move afoot to have Kyklos retract “Do Linguistic Structures Affect Human Capital? The Case of Pronoun Drop”, by Prof Horst Feldmann of the University of Bath. This move is due to the fact that Horst Feldmann has used faulty statistical reasoning to make an argument that language structure is influencing economic wealth.

There are two main flaws: 1) The assumption that pro-drop languages are categorically different from non pro-drop languages in the first place.  I have never seen a formal language model that suggest such a thing, though functional models likely allow for the possibility. (*Edit, a colleague privately told me of a formal model that does categorize pro-drop and non-pro-drop languages differently, but will not discuss further as they do not want to discuss the issue publicly.)  2) The assumption that languages are equally independent from each other.  This is definitely wrong: It is obvious on many levels that English and French are, for instance, more similar than English and Japanese by both lineage and organization.  Taking the second one into account might seriously alter any statistical model used to analyze the word language data used in Feldmann’s article.

However, I do not support this effort to demand Kyklos retract his article. It is much better to write an article that reexamines the data, using properly applied and properly reasoned statistical analysis, and rebuts Feldmann’s points if they are shown to be incorrect.

Once you go down the road of demanding that articles be retracted, not due to fraud or utter falsehood, but instead due to what you consider bad analysis, you’ve gone too far. I am morally gutted that any of my fellow linguists believe they can fight bad argumentation through suppression rather than effective counter-argument, and I repudiate such efforts.

Now, to be honest about myself and my limitations, I mostly ignore Economists when they talk about Linguistics in an Economics journal.  Just as they might do were I to talk about Economics in a Linguistics journal.  However, if any of my readers feels strongly enough to want to see the article retracted, here is my advice:  It is much better to simply argue against the ideas, preferably using better statistical models, and write a great article while doing so.  And if you do it well enough, you’ll really help your own career as well. 

If your reanalysis shows Feldmann is thoroughly wrong, say so, and say it as forcefully as you want. But, be prepared to end up possibly agreeing with some of what Feldmann had to say. This outcome is possible as you don’t really know what a thorough analysis would show in advance of running the data.  And if you think you can know in advance with certainty (rather than just strongly suspect) you might need to improve your scientific acumen.

Rivener – a maskless airflow estimation and nasalance system

Myself, Jenna Duerr, and Rachel Grace Kerr recently published an article documenting the main instrumental uses for Rivener, our mask-less air flow estimation and nasalance system. This device records audio and low-frequency pseudo-sound with microphones placed at the nose and mouth, separated by a baffle and placed in a Venturi tube to prevent that pseudo-sound from overloading the circuitry.  The device can record all the aspects of hearing-impaired speech without interfering with the audio quality of the speaker or requiring physical contact with the system.  If you want a detailed description of what the system can do, here is an unpublished “white paper” documenting the strengths and limitations of the system in detail.

Book review: Mythic Orbits 2

Hello readers,

I have always kept this site for professional work to date, but following a recent shout-out, today I’m going to introduce you to something totally different: A review of speculative fiction.

After an initial awkward meeting at Dalhousie Chemistry Week, Kristin Janz and I became friends as we attended University at Dalhousie, in Nova Scotia (the remote outcropping of rock on the Eastern Edge of Canada where we both grew up.)

I recently read a book in which she is a short-story author: Mythic Orbits Volume 2, and enjoyed it greatly.

Now, while all of the stories are worth reading, there were a couple that were themes I’ve seen often before.  I’m not good at reviewing stories with themes I’ve seen often before, so I’m going to give short thoughts on the ones that are newer for me.  For people like me, it is worth noting that the stories get more theme-original as you progress through the book, but that is a very vague generalization, so you are better off with my very short reviews of each story:

Donald S. Crankshaw’s “Her Majesty’s Guardian” was a simple, well-executed piece with a glorious conclusion.  It reminded me of the way many of the smaller societies of Earth used to handle leader purification – brutal and effective!

Linda Burklin’s “Dragon Moon” is a visually stunning story with heartwarming family-protective elements.

Kristin Janz’s “The Workshop at the End of the World” evokes perfectly how I feel every time I consider walking into a “Toys’R’Us” – and then decide I just can’t face how bloody boring the store is.  Let the reader understand.

Cindy Koepp’s “Seeking What’s Lost” is raw and brutal and deeply personally tragic.  Keep a box of tissues nearby, and be prepared to use them liberally.

If you are a religiously active Christian like me, you would think you’ve read C.O. Bonham’s “Recalled from the Red Planet” a million times…  But oddly enough, you haven’t, because no one is ever this wonderfully direct about this particular story.

William Bontrager’s “They stood still” was my favorite story.  Years ago a friend showed me a draft of a novel she was working on with scenes of time standing still that were so good I’ve lived decades since waiting to ever read anything like it again.  I will never forget how it felt to read time stop, the sheer wonder and utter terror of it.  I felt the whole world around me go quiet.  Time stood still for me.  And Bontrager brought me to that quiet place for only the second time in my reading life.  The rest of his story of post-traumatic-stress is just as good, and I would have bought this entire book for that story alone.

A.K. Meek’s “The Memory Dance” is easily the strangest and most original piece in this collection.  In some ways, it reminded me of “Leaf by Niggle”, one of Tolkein’s greatest short stories.  And following that comment, for the most part, if you want to read something this wonderfully out-there, you have to go pre-1940s sci-fi.

Keturah Lamb’s “Unerella” is a glorious take on Cinderella, and I wish there were many more such stories on Earth – a situation I’m very slowly trying to remedy myself. Kat Heckenbach’s “Mark the days” is the kind of story you wish movies like “Memento” or “Mulholland drive” could be – a tiny bit easier to follow, and infinitely less pretentious.  Give the whole book a read, and you’ll be happy.

Matthias Heyne – Language and Music

Today I want to highlight the work of my former PhD student and always-colleague, Matthias Heyne.  Matthias is currently a Postdoctoral Research Associate in the Department of Speech, Language and Hearing Sciences at Boston University (more specifically, the Speech Neuroscience Laboratory, PI Prof. Frank Guenther).

Matthias has done amazing research into the relationship between native language and trombone play style. To quote him, Matthias’ “research explores the relationship of referential and non-referential forms of communication, such as language and (instrumental) music, respectively.”

Matthias and I have published an overview of visualization research in how people play brass instruments.  In addition, Matthias and I helped improve the way we analyze tongue contour shapes, and most recently, Matthias Heyne, Xuan Wang, myself, Kieran Dorreen, and Kevin Watson published an article demonstrating that /r/ production in non-rhotic New Zealand English follows many of the patterns found in the rhotic North American English.

Over the next year you can expect many more publications from Matthias, demonstrating the relationship between both acoustics and articulation of Tongan and English vowels and tongue position in steady-state trombone notes.  Expect research into diffusion MRI to follow as Matthias will be adding brain imagery research to his repertoire.

Matthias is an excellent new researcher, and I expect great things from him throughout a long career.  I am very proud to have had him as a PhD student, and to continue working and publishing with him.

3D-printable ultrasound probe stabilizer for speech research

Christopher Carignan, Wei-rong Chen, Muawiyath ShujauCatherine T. Best, and I recently published an article about our new 3D-printable ultrasound transducer stabilizer (probe holder). 

Ultrasound tongue imaging of speech requires the imaging probe to remain stable throughout data collection. Previous solutions to this stabilization problem have often been too cumbersome and/or expensive for wide-spread use. Our solution improves upon previous designs in both functionality and comfort, while also representing the first free and open-source 3D printable headset for both academic and clinical applications of ultrasound tongue imaging.

The non-metallic design permits the simultaneous collection of ultrasound and electromagnetic articulometry. For clinicians, the headset eliminates the need for holding the imaging probe manually, allowing them to interact with patients in an unencumbered way.

The printable materials we provided work for midsaggital imaging of the tongue using a few select ultrasound transducers like the Logiq E 8C-RS and the Telemed transducers for Articulate Instruments systems, but can be modified easily to allow for other probes, or for coronal tongue imaging.

The system costs from $200 (for a 100 micron print) to $600 USD (for a 20 micron print) in materials to produce, making it quite affordable.  It is also very comfortable compared to most stabilization systems, and is accurate to within about 2mm of motion in any direction, and 2 degrees of rotation in any direction.  More details can be found in the article documenting the system.

Here is an image of the system, fully assembled and worn:

Transducer stabilizer

Transducer stabilizer

 

 

The articulation of /ɹ/ in New Zealand English

Matthias Heyne, Xuan Wang, myself (Donald Derrick), Kieran Dorreen, and Kevin Watson have recently had an article documenting the articulation of  /ɹ/ in New Zealand English.

This work is therefore in part a follow-up to some of my co-authored research into biomechanical modelling of English  /ɹ/ variants, indicating that vocalic context influences variation through muscle stress, strain, and displacement.  It is, by these three measures, “easier” to move from an /i/ to a tip-down /ɹ/ , but from /a/ to a tip-up /ɹ/.

In this study, for speakers who vary at all (some only do tip-up or tip-down), they are most likely to produce tip-up /ɹ/ in these conditions:

back vowel > low central vowel > high front vowel

initial /ɹ/ > intervocalic /ɹ/ > following a coronal (“dr”) > following a velar (“cr”)

The results show that allophonic variation of NZE /ɹ/ is similar to that in American English, indicating that the variation is caused by similar constraints.  The results support theories of locally optimized modular speech motor control, and a mechanical model of rhotic variation.

The abstract is repeated below, with links to articles contained within:

This paper investigates the articulation of approximant /ɹ/ in New Zealand English (NZE), and tests whether the patterns documented for rhotic varieties of English hold in a non- rhotic dialect. Midsagittal ultrasound data for 62 speakers producing 13 tokens of /ɹ/ in various phonetic environments were categorized according to the taxonomy by Delattre & Freeman (1968), and semi-automatically traced and quantified using the AAA software (Articulate Instruments Ltd. 2012) and a Modified Curvature Index (MCI; Dawson, Tiede & Whalen 2016). Twenty-five NZE speakers produced tip-down /ɹ/ exclusively, 12 tip-up /ɹ/ exclusively, and 25 produced both, partially depending on context. Those speakers who produced both variants used the most tip-down /ɹ/ in front vowel contexts, the most tip- up /ɹ/ in back vowel contexts, and varying rates in low central vowel contexts. The NZE speakers produced tip-up /ɹ/ most often in word-initial position, followed by intervocalic, then coronal, and least often in velar contexts. The results indicate that the allophonic variation patterns of /ɹ/ in NZE are similar to those of American English (Mielke, Baker & Archangeli 2010, 2016). We show that MCI values can be used to facilitate /ɹ/ gesture classification; linear mixed-effects models fit on the MCI values of manually categorized tongue contours show significant differences between all but two of Delattre & Freeman’s (1968) tongue types. Overall, the results support theories of modular speech motor control with articulation strategies evolving from local rather than global optimization processes, and a mechanical model of rhotic variation (see Stavness et al. 2012).

Trip to Taiwan: Talks and conference

I spent October 14 to October 23, 2017 in Taiwan, giving many talks.  My first was a talk at National Taiwan university on Monday the 16th.  There I spoke about commercializing research.

On Wednesday the 18th, I went to Academia Sinica and the Institute of Linguistics and spoke on aero-tactile integration in speech perception.

Lastly, on the weekend of the 21-22nd, I spoke at workshop at National Tsing Hua University (my hosts) on ultrasound and EMA research.

If you want copies of the talks, send me an email to my work.  Apologies for the hassle: They are all too large to post to this website.

Ultrasound/EMA guide

This is a guide to the use of ultrasound and EMA in combination.  It is a bit out of date, and probably needs a day or two of work to make fully correct, but it describes the techniques I use with 3 researchers.  Of course I wrote this years ago, and now I can run an Ultrasound/EMA experiment by myself if I need to.