Monthly Archives: October 2017

Trip to Taiwan: Talks and conference

I spent October 14 to October 23, 2017 in Taiwan, giving many talks.  My first was a talk at National Taiwan university on Monday the 16th.  There I spoke about commercializing research.

On Wednesday the 18th, I went to Academia Sinica and the Institute of Linguistics and spoke on aero-tactile integration in speech perception.

Lastly, on the weekend of the 21-22nd, I spoke at workshop at National Tsing Hua University (my hosts) on ultrasound and EMA research.

If you want copies of the talks, send me an email to my work.  Apologies for the hassle: They are all too large to post to this website.

Ultrasound/EMA guide

This is a guide to the use of ultrasound and EMA in combination.  It is a bit out of date, and probably needs a day or two of work to make fully correct, but it describes the techniques I use with 3 researchers.  Of course I wrote this years ago, and now I can run an Ultrasound/EMA experiment by myself if I need to.

Ultrasound Video and Microphone Audio Capture

This is a simple set of one-line scripts for capturing ultrasound audio and video.

I built it to work as batch files through the WINDOWS OS command-line because that’s the OS that seems to give me the highest frame-rate. (I use macs, and this works with the windows OS booted from bootcamp).

Look at the README file to make sure you use the scripts properly.

As always, contact me if you have issues.

 

 

Crop and Segment Video

Here I offer you a program that will scan through all of the PRAAT textgrids in a folder, and for each it will search for the named textgrid tier.  Then it will loop through each segment in that tier, find the ones with text in them, and cut clips from a video with the exact same base-name based on those time stamps.  Each video will be cropped to the region given in the cropping variable (currently set for the Logiq E ultrasound).

The program uses R as a wrapper to load PRAAT textgrid files, uses a PERL program textgrid2csv.pl (copyright Theo Veenker <T.J.G.Veenker@uu.nl>) to make a CSV file usable in PRAAT, and work with that data.  

Therefore: 1) You have to extract audio from the video file you want to crop and segment, and transcribe and label that video in a PRAAT textgrid to the detail you want to use for each cropped video file (usually a word or phrase). 2) Go into the code, and change all the variables at the top according to your needs.

Lots of work, but this program will still save you heaps of time.  It is especially useful if you are using AAA for ultrasound analysis but only have video instead of AAA’s proprietary ultrasound file storage format.

Note, I provide sample data in the zip file to test the program – a swallow used for a palate trace.  Get the program to run with the sample before you modify it for your own purposes.

Aligning Audio and Video

Dealing with video files is just about the most obnoxious experience a researcher can have.  I wasted a *year* of research getting this one wrong before I realized the only, and I do mean only, effective solution involves FFMPEG.  Here I offer you a program that will re-align every video held in one directory and for which you have alignment data.

The program uses R as a wrapper to load a .csv file that contains the meta-data on a directory of video files that you want to align. 

Therefore: 1) You have to hand-check the audio-visual offsets for each file, and put that into the .csv file. 2) You also have to make sure you have installed FFMPEG, SOX, PERL, R, and the R modules “reader” and “gdata”. 3) You have to look inside my R-code and change the paths and extensions to make the program will work on your computer.

I provide a sample video with a swallow used to obtain a palate trace.  Get the software to work on your machine with this sample before modifying the code for your project.

 

‘Air Puffs’ – RNZ broadcast

Some time ago, I was on Radio New Zealand discussing my research on the use of air flow to enhance speech perception. Alas, it did not have the commercial value we thought it would have due to the need for higher airflow than is feasible necessary to enhance speech perception in continuous speech. However, it since led to the development of a mask-less and plate-less air flow estimation system that works well. The system provides useful biofeedback information that has the potential to help with speech therapy and accent modification.

Phoneme Quality

I rewrote a PRAAT script – shamelessly edited from Mietta’s amazing original – but modified to work well on both MAC and PC.  The scrip opens all the WAVE or AIFF files and matching textgrids, and take a look at the relevant tier (defaults to 3) to extract duration, f0 (pitch), F1, F2, F3, cog. The PRAAT script and readme file are located here

Mietta Lenne’s scripts, for those who don’t know, seem to be on Github these days.

Minimal Pairs

I wrote this PERL script in order to output minimal pairs from a document.  To use this properly, you need to take a document and save it as text.  Then you need to substitute items in the text so that each phoneme has a unique ASCII code.  I highly recommend a key-item pair so you can convert the text back and forth as needed.

Then make sure that PERL is installed on your computer.

Then, from your command-line, go to the directory with both this .pl file AND your text file, and type in the relevant information in this format:

perl minimal_pair.pl {input.txt} {output.txt} {difference} {minimum size}

difference should be kept at 1 unless you want non-minimal pairs (not well tested)

Minimum size represent the minimum word size you wish to capture.

Output will have columns of the minimal pairs with the structure

minimal pair 1 | minimal pair 2 | phoneme 1 | phoneme 2

 

Introduction

My name is Donald Derrick, and this web-site is dedicated to presenting my research on speech production and perception.

On the production side, I examine vocal tract motion (both shape and muscle position), air flow, oral and nasal acoustics, and visual face motion. I then use this production information to study audio, visual, and tactile speech perception. The purpose is to identify constraints on low level production, and low level percepts that can enhance or interfere with speech perception.

This research has helped identify constraints such as gravity, muscle elasticity, and end-state-comfort on speech production and brought in true multi-modal speech perception research by adding (aero)-tactile speech into audio-visual speech study. I have used this research to expand our understanding of the nature of speech perception, and have been working on commercialization of the use of air flow in enhancement of speech perception, as well as recording oral, nasal, and air flow outputs in speech without the use of masks or other stigmatizing measurement systems.

I am, as of 2017, working on a sonority scale for visual and tactile speech, as well as a both behavioral and brain research on audio-visual-tactile speech in coordination with the University of Canterbury’s Speech lab.

The end-goal is to form a true multi-sensory understanding of speech production and perception that does not ignore or minimize any of our senses.