In 2013, I recorded 11 North American English speakers, each reading eight phrases with two flaps in two syllables (e.g “We have editor books”), and at 5 speech rates, from about 3 syllables/second to 7 syllables/second. Each recording included audio, ultrasound imaging of the tongue, and articulometry.
The dataset has taken a truly inordinate amount of time to label, transcribe (thank you Romain Fiasson), rotate, align ultrasound to audio, fit in shared time (what is known as a Procrustean fit), extract acoustic correlates, and clean from tokens that have recording or unfixable alignment errors.
It is, however, now 2019 and I have a cleaned dataset. I’ve uploaded the dataset, with data at each point of processing included, to an Open Science Framework website: I will, over the next few weeks, upload documentation on how I processed the data, as well as videos of the cleaned data showing ultrasound and EMA motion.
By September 1st, I plan on submitting a research article discussing the techniques used to build the dataset, as well as theoretically motivated subset of the articulatory to acoustic correlates within this dataset to a special issue of a journal whose name I will disclose should they accept the article for publication.
This research was funded by a Marsden Grant from New Zealand, “Saving energy vs. making yourself understood during speech production”. Thanks to Mark Tiede for writing the quaternion rotation tools needed to oriented EMA traces, and to Christian Kroos for teaching our group at Western Sydney Universiy how to implement them. Thanks to Michael Proctor for building filtering and sample repair tools for EMA traces. Thanks also to Wei-rong Chen for writing the palate estimation tool needed to replace erroneous palate traces. Special thanks to Scott Lloyd for his part in developing and building the ultrasound transducer holder prototype used in this research. Dedicated to the memory of Roman Fiasson, who completed most of the labelling and transcription for this project.