Building a cleaned dataset of aligned ultrasound, articulometry, and audio.

In 2013, I recorded 11 North American English speakers, each reading eight phrases with two flaps in two syllables (e.g “We have editor books”), and at 5 speech rates, from about 3 syllables/second to 7 syllables/second. Each recording included audio, ultrasound imaging of the tongue, and articulometry.

The dataset has taken a truly inordinate amount of time to label, transcribe (thank you Romain Fiasson), rotate, align ultrasound to audio, fit in shared time (what is known as a Procrustean fit), extract acoustic correlates, and clean from tokens that have recording or unfixable alignment errors.

It is, however, now 2019 and I have a cleaned dataset. I’ve uploaded the dataset, with data at each point of processing included, to an Open Science Framework website: I will, over the next few weeks, upload documentation on how I processed the data, as well as videos of the cleaned data showing ultrasound and EMA motion.

By September 1st, I plan on submitting a research article discussing the techniques used to build the dataset, as well as theoretically motivated subset of the articulatory to acoustic correlates within this dataset to a special issue of a journal whose name I will disclose should they accept the article for publication.

This research was funded by a Marsden Grant from New Zealand, “Saving energy vs. making yourself understood during speech production”. Thanks to Mark Tiede for writing the quaternion rotation tools needed to oriented EMA traces, and to Christian Kroos for teaching our group at Western Sydney Universiy how to implement them. Thanks to Michael Proctor for building filtering and sample repair tools for EMA traces. Thanks also to Wei-rong Chen for writing the palate estimation tool needed to replace erroneous palate traces. Special thanks to Scott Lloyd for his part in developing and building the ultrasound transducer holder prototype used in this research. Dedicated to the memory of Roman Fiasson, who completed most of the labelling and transcription for this project.

Tutorial 4: Coin-toss for Linguists

Here is a basic demonstration of how randomness works, but because I am writing this for linguists rather than statisticians, I’m modifying the standard coin-toss example for speech. Imagine you have a language with words that all start with either “t” or “d”. The word means the same thing regardless, so this is a “phonetic” rather than “phonemic” difference. Imagine also that each speaker uses “t” or “d” randomly about 50% of the time. Then record four speakers saying 20 of these words 10 times each.

Now ask the question: Will some words have more “t” productions than others?

The answer is ALWAYS yes, even when different speakers produce “t” and “d” sounds as completely random choices. Let me show you:

As with most of these examples I provide, I begin with code for libraries, colors, and functions.

library(tidyverse)
library(factoextra)
library(cluster)

RED0 = (rgb(213,13,11, 255, maxColorValue=255))
BLUE0 = (rgb(0,98,172,255, maxColorValue=255))
GOLD0 = (rgb(172,181,0,255, maxColorValue=255))

Then I provide code for functions.

randomDistribution <-function(maxCols,maxRep,replaceNumber,cat1,cat2)
{
distro = tibble(x=c(1:maxCols),y=list(rep(cat1, maxRep)))
for (i in sample(1:maxCols, replaceNumber, replace=TRUE))
{
distro$y[[i]] <- tail(append(distro$y[[i]],cat2), maxRep)
}
distroTibble = tibble(x = c(1:(maxCols * maxRep)), n = 1, y = "")
for (i in c(1:maxCols))
{
for (j in c(1:maxRep))
{
distroTibble$x[((i-1)maxRep)+j] = i
distroTibble$n[((i-1)maxRep)+j] = j
distroTibble$y[((i-1)*maxRep)+j] = distro$y[[i]][j]
}
}
return(distroTibble)
}

randomOrder <- function(distro) { distro %<>% mutate(y = case_when(line %in% sample(line)[1:100] ~ "d", TRUE ~ y)) %>%
ungroup() %>% group_by(x, y) %>% summarize(count = n()) %>%
mutate(perc = count/sum(count)) %>% ungroup() %>%
arrange(y, desc(perc)) %>% mutate(x = factor(x, levels=unique(x))) %>%
arrange(desc(perc))
return(distro)
}

And now for the data itself. I build four tables with 20 words (x values) and 10 recordings (n values) each, with the recordings labelled in the “y” value. I start by labeling all these “t”, and then randomly select half of the production and call them “d” instead of “t”. I then compute the percentage of each variant by word (x)

I also combine the four speakers, and do the same for all of them.

D1 <- randomDistribution(20,10,"t")
D2 <- randomDistribution(20,10,"t")
D3 <- randomDistribution(20,10,"t")
D4 <- randomDistribution(20,10,"t")
D5 <- bind_rows(D1,D2,D3,D4)

D1 = randomOrder(D1)
D2 = randomOrder(D2)
D3 = randomOrder(D3)
D4 = randomOrder(D4)
D5 = randomOrder(D5)

Now I plot a distribution graph for all of them. Note that some words are mostly one type of production (“d”), and others are mostly the other production (“t”). This inevitably occurs by random chance. And it differs by participant.

However, even when you pool all the participant data, you see the same result. This distribution is a part of the nature of how randomization works, and needs no other explanation other than this aspect of randomization is a part of the nature of reality.

D1 %>% ggplot(aes(x=x, fill=y, y=perc)) + geom_bar(stat="identity") + scale_y_continuous(labels=scales::percent) + ggtitle("group 1")

D2 %>% ggplot(aes(x=x, fill=y, y=perc)) + geom_bar(stat="identity") + scale_y_continuous(labels=scales::percent) + ggtitle("group 2")

D3 %>% ggplot(aes(x=x, fill=y, y=perc)) + geom_bar(stat="identity") + scale_y_continuous(labels=scales::percent) + ggtitle("group 3")

D4 %>% ggplot(aes(x=x, fill=y, y=perc)) + geom_bar(stat="identity") + scale_y_continuous(labels=scales::percent) + ggtitle("group 4")

D5 %>% ggplot(aes(x=x, fill=y, y=perc)) + geom_bar(stat="identity") + scale_y_continuous(labels=scales::percent) + ggtitle("all groups")

And you can see that the combined data from all four speakers still shows some words that have almost no “d”, and some words have very few “t” values.

Because a purely random distribution will generate individual words with few or even none of a particular variant, even across speakers, you cannot use differences in the distributions by itself to identify any meaningful patterns.

And that is the “coin toss” tutorial for Linguists. The main takeaway message is that you need minimal pairs, or at least minimal environments, to establish evidence that a distribution of two phonetic outputs could be phonemic.

Even then, the existence of a phonemic distinction doesn’t mean it predicts very many examples in speech.

Tutorial 3: K means clustering

One of the easiest and most appropriate methods for testing whether a data set contains multiple categories is k-means clustering. This technique can be supervised, in that you tell the computer how many clusters you think are in the original file. However, it is much wiser to test many k-means clusters using an unsupervised process. Here we show three of these. The The first one we will examine is the “elbow” method, runs several clusters, and produces a graph that visually lets you see what the ideal number of clusters is. You identify it by seeing the “bend” in the elbow. Here’s some code for generating a very distinct binary cluster and running the elbow test.

library(tidyverse)
library(factoextra)
library(cluster)
points = 10000
sd1 = 1
sd2 = 1
mu1 = 0
mu2 = 6
p=integrate(min.f1f2, -Inf, Inf, mu1=mu1, mu2=mu2, sd1=sd1, sd2=sd2)

G1 <- tibble(X = rnorm(points, mean = mu1, sd = sd1),
Y = rnorm(points, mean = 0, sd = sd1),
Name="Group 1", col = GOLD0A,Shape=1)

G2 <- tibble(X = rnorm(points, mean = mu2, sd = sd2),
Y = rnorm(points, mean = 0, sd = sd2),
Name="Group 2", col = BLUE0A,Shape=2)

G <- bind_rows(G1,G2) p2 = length(G$X[G$Name=="Group 1" & G$X> min(G$X[G$Name=="Group 2"])])/points

p2 = p2 + length(G$X[G$Name=="Group 2" & G$X< max(G$X[G$Name=="Group 1"])])/points
p2 = p2/2
fviz_nbclust(G[, 1:2], kmeans, method = "wss")

The second technique will tell you the answer, identifying a peak “silhouette width” with a handy dashed line.

fviz_nbclust(G[, 1:2], kmeans, method = "silhouette")

The third shows a “gap” statistic, with the highest peak identified.

gap_stat <- clusGap(G[, 1:2], FUN = kmeans, nstart = 25, K.max = 10, B = 50) fviz_gap_stat(gap_stat)

As you can see, all three cluster identification techniques show that the ideal number of clusters is 2. Which makes sense because that is the number we initially generated.

Here I show you what the difference between the real cluster and the estimate cluster looks like, beginning with the real cluster.

G %>% ggplot(aes(x = X, y = Y)) +
geom_point(aes(colour = Name), show.legend = TRUE) +
scale_color_manual(values=c(GOLD0A,BLUE0A)) +
xlab(paste("Overlap percent = ",percent(as.numeric(p[1])), " : Overlap range = ", percent(p2),sep="")) + ylab("") + coord_equal(ratio=1)

Followed by the k-means cluster.

set.seed(20)
binaryCluster <- kmeans(G[, 1:2], 2, nstart = 10, algorithm="Lloyd") binaryCluster$cluster <- as.factor(binaryCluster$cluster) binaryCluster$color[binaryCluster$cluster == 1] = GOLD0A binaryCluster$color[binaryCluster$cluster == 2] = BLUE0A G$col2 = binaryCluster$color G %>% ggplot(aes(x = X, y = Y)) +
geom_point(aes(color = col2), show.legend = TRUE) +
scale_color_manual(values=c(GOLD0A,BLUE0A)) +
xlab("Unsupervised binary separation") + ylab("") + coord_equal(ratio=1)

Notice that the unsupervised clusering will mis-categorize some items in the cluster, but gets most of them correct.

Here we generate a binary separated by 4 standard deviations.

points = 10000
sd1 = 1
sd2 = 1
mu1 = 0
mu2 = 4
p=integrate(min.f1f2, -Inf, Inf, mu1=mu1, mu2=mu2, sd1=sd1, sd2=sd2)
G1 <- tibble(X = rnorm(points, mean = mu1, sd = sd1), Y = rnorm(points, mean = 0, sd = sd1), Name="Group 1", col = GOLD0A,Shape=1)

G2 <- tibble(X = rnorm(points, mean = mu2, sd = sd2), Y = rnorm(points, mean = 0, sd = sd2), Name="Group 2", col = BLUE0A,Shape=2)

G <- bind_rows(G1,G2) p2 = length(G$X[G$Name=="Group 1" & G$X> min(G$X[G$Name=="Group 2"])])/points
p2 = p2 + length(G$X[G$Name=="Group 2" & G$X< max(G$X[G$Name=="Group 1"])])/points
p2 = p2/2

Notice that even with 4 standard deviations separating the groups, the elbow technique still clearly diagnoses 2 clusters – a binary system.

fviz_nbclust(G[, 1:2], kmeans, method = "wss")

fviz_nbclust(G[, 1:2], kmeans, method = "silhouette")

gap_stat <- clusGap(G[, 1:2], FUN = kmeans, nstart = 10, K.max = 10, B = 50) fviz_gap_stat(gap_stat)

And here is the underlying cluster with overlapped entries.

G %>% ggplot(aes(x = X, y = Y)) +
geom_point(aes(colour = Name), show.legend = TRUE) +
scale_color_manual(values=c(GOLD0A,BLUE0A)) +
xlab(paste("Overlap percent = ",percent(as.numeric(p[1])),
" : Overlap range = ",percent(p2),sep="")) + ylab("") + coord_equal(ratio=1)

Notice that the cluster analysis misidentifies many entries – about 5% of them.

set.seed(20)
binaryCluster <- kmeans(G[, 1:2], 2, nstart = 10, algorithm="Lloyd") binaryCluster$cluster <- as.factor(binaryCluster$cluster) binaryCluster$color[binaryCluster$cluster == 1] = GOLD0A binaryCluster$color[binaryCluster$cluster == 2] = BLUE0A G$col2 = binaryCluster$color G %>% ggplot(aes(x = X, y = Y)) +
geom_point(aes(color = col2), show.legend = TRUE) +
scale_color_manual(values=c(GOLD0A,BLUE0A)) +
xlab("Unsupervised binary separation") + ylab("") + coord_equal(ratio=1)

Lastly, here is a binary that is only separated by 2 standard deviations. A barely noticeable binary.

points = 10000
sd1 = 1
sd2 = 1
mu1 = 0
mu2 = 2
p=integrate(min.f1f2, -Inf, Inf, mu1=mu1, mu2=mu2, sd1=sd1, sd2=sd2)
G1 <- tibble(X = rnorm(points, mean = mu1, sd = sd1), Y = rnorm(points, mean = 0, sd = sd1), Name="Group 1", col = GOLD0A,Shape=1)

G2 <- tibble(X = rnorm(points, mean = mu2, sd = sd2), Y = rnorm(points, mean = 0, sd = sd2), Name="Group 2", col = BLUE0A,Shape=2)

G <- bind_rows(G1,G2) p2 = length(G$X[G$Name=="Group 1" & G$X> min(G$X[G$Name=="Group 2"])])/points
p2 = p2 + length(G$X[G$Name=="Group 2" & G$X< max(G$X[G$Name=="Group 1"])])/points
p2 = p2/2

Notice that even with 2 standard deviations separating the groups, the elbow technique DOES diagnose that this is a binary system, but barely. The silhouette and gap techniques also point to a binary.

library(factoextra)
fviz_nbclust(G[, 1:2], kmeans, method = "wss")

fviz_nbclust(G[, 1:2], kmeans, method = "silhouette")

gap_stat <- clusGap(G[, 1:2], FUN = kmeans, nstart = 10, K.max = 10, B = 50) fviz_gap_stat(gap_stat)

Here you can see the underlying binary division.

G %>% ggplot(aes(x = X, y = Y)) + geom_point(aes(colour = Name), show.legend = TRUE) + scale_color_manual(values=c(GOLD0A,BLUE0A)) + xlab(paste("Overlap percent = ",percent(as.numeric(p[1]))," : Overlap range = ",percent(p2),sep="")) + ylab("") + coord_equal(ratio=1)

And as you would expect, oh boy does the k-means clustering make mistakes.

set.seed(20)
binaryCluster <- kmeans(G[, 1:2], 2, nstart = 10, algorithm="Lloyd") binaryCluster$cluster <- as.factor(binaryCluster$cluster) binaryCluster$color[binaryCluster$cluster == 1] = GOLD0A binaryCluster$color[binaryCluster$cluster == 2] = BLUE0A G$col2 = binaryCluster$color G %>% ggplot(aes(x = X, y = Y)) +
geom_point(aes(color = col2), show.legend = TRUE) +
scale_color_manual(values=c(GOLD0A,BLUE0A)) +
xlab("Unsupervised binary separation") +
ylab("") + coord_equal(ratio=1)

However, K-means clustering can still uncover the binary.

References:

Weitzman, M. S. (1970). Measures of overlap of income distributions of white and Negro families in the United States. Washington: U.S. Bureau of the Census.

https://afit-r.github.io/kmeans_clustering

https://rpubs.com/williamsurles/310847

University of Canterbury Open Day

The University of Canterbury held this year’s Open Day on Thursday, July 11, 2019. It was a chance for high-school students to look at possible majors at our University. This year I had the chance to showcase UC Linguistics, and I brought along our ultrasound machine to show people images of my tongue in motion, and let them see their tongues. A few were intimidated by the idea of seeing their own tongues on a machine, but lots of young students participated, and hopefully got a bit of a taste for Linguistics and especially phonetic research.

However, next year I will try to build more materials to address all the ways linguistics can be useful to students. I like the fact that Linguistics is both arts and science at the same time. You learn to write, you learn numeracy, you learn statistics, and you learn how to do experiments. And on top of that, our students learn how to speak in public and speak well. These are exceedingly useful skills, and have led students to continue in research, get positions with Stats NZ, build up computer research in local companies, and so much more.

Tutorial 2: Overlapping binaries.

Having previously demonstrated what two binary groupings look like when they are separated by six standard deviations, here I demonstrate what they look like when separated by 4 standard deviations. Such a binary has an overlapping coefficient of 4.55%, as seen from the code below, which computes from integration based on Weitzman’s overlapping distribution.

## 0.04550026 with absolute error < 3.8e-05
## [1] "4.55%"

This is what such data looks like graphed in a density curve.

The overlap range is now much larger, as can be seen in the scatterplot below.

Now let’s look at an overlap range of 2 standard deviations.

## 0.3173105 with absolute error < 4.7e-05
## [1] "31.73%"

The density plot now overlaps a lot.

And this is what the scatterplot looks like.

Now look at the scatterplot without color differences. At this point there is the barest of hints that there might be a binary in this system at all.

Let us compare that to the initial binary, separated by 6 standard deviations, now in grey.

This image has an empty alt attribute; its file name is image-22-1024x731.png

With this data, the binary remains visible and obvious even when both samples are gray.

However, even if you cannot observe categories by directly looking, there are tools that can help identify N-nary categories in what looks to us like gradient data – the tools of unsupervised cluster analysis, which I will discuss in the next tutorial.

The RMarkdown file used to generate this post can be found here. Some of the code was modified from code on this site.

References:

Weitzman, M. S. (1970). Measures of overlap of income distributions of white and Negro families in the United States. Washington: U.S. Bureau of the Census.

Tutorial 1: Gradient effects within binary systems

This post provides a visual example of gradient behaviour within a univariate binary system.

Here I demonstrate what two binary groupings look like when each binary is separated on a non-dimensional scale of 1 standard deviation for each binary, with a separation of 6 standard deviations. Such a binary has an overlapping coefficient of 0.27%, as seen from the code below, which was computed from integration based on Weitzman’s overlapping distribution.

## [1] "0.27%"

But the overlapping range hides the fact that in a group of, say, 10,000 for each binary, the outlier overlap is often enormous, and sometimes individual tokens look like they belong firmly in the other binary choice – like the one blue dot in the gold cloud. (Note that the y-axis is added to make the display easier to understand, but provides none of the data used in this analysis.)

In short, in a binary systems, individual tokens that exist thoroughly within the other binary range will exist due to simple random variation, yet they do not present evidence of constant gradient overlap or against the existence of the binary. Such things occur as long as the two binaries are close enough in relation to the number of examples – close enough being determined by simple probability, even in a univariate system (one without outside influences.)

The RMarkdown file used to generate this post can be found here. Some of the code was modified from code on this site.

References:

Weitzman, M. S. (1970). Measures of overlap of income distributions of white and Negro families in the United States. Washington: U.S. Bureau of the Census.

Visual-tactile Speech Perception and the Autism Quotient

Katie Bicevskis, Bryan Gick, and I recently published “Visual-tactile Speech Perception and the Autism Quotient” in Frontiers in Communication: Language Sciences. In this article, we demonstrated that the more people self-describe as having autistic-spectrum traits, the more they tolerate a separation of time between air-flow hitting the skin and lip opening from a video of someone saying an ambiguous “ba” or “pa” when identifying the syllable they saw and felt, but did not hear.

First, in an earlier publication, we showed that visual-tactile speech integration depended on this alignment of lip opening and airflow, and that this is evidence of modality-neutral speech primitives. We use whatever information we have during speech perception regardless of whether we see, feel, or hear it.

Summary results from Bicevskis et al. (2016), as seen in Derrick et al. (2019).

This result is best illustrated with the image above. The image shows a kind of topographical map, where white represents the “mountaintop” of people saying the ambiguous audio-tactile syllable is a “pa”, and green represents the “valley” of people saying the ambiguous audio-tactile syllable is a “ba”. On the X-axis is the alignment of the onset of air-flow release and lip opening. On the Y-axis is the participants’ Autism-spectrum Quotient. Lower numbers represent people who describe themselves as having the least autistic-like traits; the most neurotypical. At the bottom of the scale, perceivers identify the ambiguous syllables as “pa” with as much as 70-75% likelihood when the air-flow arrived 100-150 milliseconds after lip opening – about when it would arrive if a speaker stood 30-45 cm away from the perceiver. Deviations led to steep dropoffs, where perceivers would identify the syllable as “pa” only 20-30% of the time if the air flow arrived 300 milliseconds before the lip opening. In contrast, at the top of the AQ scale, perceivers reported perceiving “pa” as little as only 5% more often when audio-tactile alignment was closer to that experienced in typical speech.

Interaction between audio-tactile alignment and Autism-spectrum Quotient.
Interaction between audio-tactile alignment and Autism-spectrum Quotient.

These results are very similar what happens with people who are on the autism spectrum with audio-visual speech. Autists listen to speech with their ears more than they look with their eyes, showing a weak multisensory coherence during perceptual tasks (Happé and Frith, 2006). Our results suggest such weak coherence extends into the neutoryipcal population, and can be measured in tasks where the sensory modalities are well-balanced (which is easier to do in speech when audio is removed.)

References:

Bicevskis, K., Derrick, D., and Gick, B. (2016). Visual-tactile integration in speech perception: Evidence for modality neutral speech primitives. Journal of the Acoustical Society of America, 140(5):3531–3539

Derrick, D., Bicevskis, K., and Gick, B. (2019). Visual-tactile speech perception and the autism quotient. Frontiers in Communication – Language Sciences, 3(61):1–11

Derrick, D., Anderson, P., Gick, B., and Green, S. (2009). Characteristics of air puffs produced in English ‘pa’: Experiments and simulations. Journal of the Acoustical Society of America, 125(4):2272–2281

Happé, F., and Frith, U. (2006). The Weak Coherence Account: Detail-focused Cognitive Style in Autism Spectrum Disorders. Journal of Autism and Developmental Disorders, 36(1):5-25

The pitfalls of audio-visuo-tactile research

I am going to be submitting an article entitled “Tri-modal Speech: Audio-Visual-Tactile integration in Speech Perception”, along with my co-authors Doreen Hansmann and Catherine Theys, within the month. The article was, in the end, a success, demonstrating that visual and tactile speech can, separately and jointly, enhance or interfere with accurate auditory syllable identification in two-way forced-choice experiments.

However, I am writing this short post to serve as a warning to anyone who wishes to combine visual, tactile, and auditory speech perception research into one experiment. Today’s technology makes that exceedingly difficult:

The three of us have collective experience with electroencephalography, magnetic resonance imaging, and with combining ultrasound imaging of the tongue with electromagnetic articulometry. These are complex tasks that require a great deal of skill and training to complete successfully. Yet this paper’s research was the most technically demanding and error-prone task we have ever encountered. The reason is that despite all of the video you see online today, modern computers do not easily allow for research-grade, synchronized video within experimental software. Due to today’s multi-core central processing, it was in fact easier to do such things a 15 years ago than it is now. The number and variety of computer bugs in the operating system, video and audio library codecs, and experimental software presentation libraries were utterly overwhelming.

We programmed this experiment in PsychoPy2, and after several rewrites and switching between a number of visual and audio codecs, we were forced to abandon the platform entirely due to unfixable intermittent crashes, and switch to MatLab and PsychToolBox. PsychToolBox also had several issues, but with several days of system debugging effort by Johnathan Wiltshire, programmer analyst at the University of Canterbury’s psychology department, these issues were at least resolvable. We cannot thank Johnathan enough! In addition, electrical issues with our own air flow system made completion of this research a daunting task, requiring a lot of help and repairs from Scott Lloyd of Electrical engineering. Scott did a lot of burdensome work for us, and we are grateful.

All told, I alone lost almost 100 working days to debugging and repair efforts during this experiment. We therefore recommend all those who follow up on this research make sure that they have collaborators with backgrounds in both engineering and information technology, work in labs with technical support, and have budgets and people who can and will build electrically robust equipment. We also recommend not just testing, debugging, and piloting experiments, but also the generation of automated iterative tools that can identify and allow the resolution of uncommon intermittent errors.

Your mental health depends on you following this advice.

Preliminary Report: Visual-tactile Speech Perception and the Autism Quotient

Katie Bicevskis, Bryan Gick, and I just had “Visual-tactile Speech Perception and the Autism Quotient” – our reexamination and expansion our evidence for ecologically valid visual-tactile speech perception – accepted to Frontiers in Communications: Language Sciences.  Right now only the abstract and introductory parts are online, but the whole article will be up soon.  The major contribution of this article is  that speech perceivers integrate air flow information during visual speech perception with greater reliance upon event-related accuracy the more they self-describe as neurotypical.  This behaviour supports the Happé & Frith (2006) weak coherence account of Autism Spectrum Disorder.  Put very simply, neurotypical people perceive whole events, but people with ASD perceive uni-sensory parts of events, often with greater detail than their neurotypical counterparts.  This account partially explains how autists can have deficiencies in imagination and social skills, but also be extremely capable in other areas of inquiry.  Previous models of ASD offered an explanation of disability, Happé and Frith offer an explanation of different ability.

I will be expanding on this discussion, with a plain English explanation of the results, once the article is fully published.  For now, the article abstract is re-posted here:

“Multisensory information is integrated asymmetrically in speech perception: An audio signal can follow video by 240 milliseconds, but can precede video by only 60 ms, without disrupting the sense of synchronicity (Munhall et al., 1996). Similarly, air flow can follow either audio (Gick et al., 2010) or video (Bicevskis et al., 2016) by a much larger margin than it can precede either while remaining perceptually synchronous. These asymmetric windows of integration have been attributed to the physical properties of the signals; light travels faster than sound (Munhall et al., 1996), and sound travels faster than air flow (Gick et al., 2010). Perceptual windows of integration narrow during development (Hillock-Dunn and Wallace, 2012), but remain wider among people with autism (Wallace and Stevenson, 2014). Here we show that, even among neurotypical adult perceivers, visual-tactile windows of integration are wider and flatter the higher the participant’s Autism Quotient (AQ) (Baron-Cohen et al, 2001), a self-report screening test for Autism Spectrum Disorder (ASD). As ‘pa’ is produced with a tiny burst of aspiration (Derrick et al., 2009), we applied light and inaudible air puffs to participants’ necks while they watched silent videos of a person saying ‘ba’ or ‘pa’, with puffs presented both synchronously and at varying degrees of asynchrony relative to the recorded plosive release burst, which itself is time-aligned to visible lip opening. All syllables seen along with cutaneous air puffs were more likely to be perceived as ‘pa’. Syllables were perceived as ‘pa’ most often when the air puff occurred 50-100 ms after lip opening, with decaying probability as asynchrony increased. Integration was less dependent on time-alignment the higher the participant’s AQ. Perceivers integrate event-relevant tactile information in visual speech perception with greater reliance upon event-related accuracy the more they self-describe as neurotypical, supporting the Happé & Frith (2006) weak coherence account of ASD.”

Feldmann’s “Do Linguistic Structures Affect Human Capital?”: Rebuttal is better than suppression.

There is a move afoot to have Kyklos retract “Do Linguistic Structures Affect Human Capital? The Case of Pronoun Drop”, by Prof Horst Feldmann of the University of Bath. This move is due to the fact that Horst Feldmann has used faulty statistical reasoning to make an argument that language structure is influencing economic wealth.

There are two main flaws: 1) The assumption that pro-drop languages are categorically different from non pro-drop languages in the first place.  I have never seen a formal language model that suggest such a thing, though functional models likely allow for the possibility. (*Edit, a colleague privately told me of a formal model that does categorize pro-drop and non-pro-drop languages differently, but will not discuss further as they do not want to discuss the issue publicly.)  2) The assumption that languages are equally independent from each other.  This is definitely wrong: It is obvious on many levels that English and French are, for instance, more similar than English and Japanese by both lineage and organization.  Taking the second one into account might seriously alter any statistical model used to analyze the word language data used in Feldmann’s article.

However, I do not support this effort to demand Kyklos retract his article. It is much better to write an article that reexamines the data, using properly applied and properly reasoned statistical analysis, and rebuts Feldmann’s points if they are shown to be incorrect.

Once you go down the road of demanding that articles be retracted, not due to fraud or utter falsehood, but instead due to what you consider bad analysis, you’ve gone too far. I am morally gutted that any of my fellow linguists believe they can fight bad argumentation through suppression rather than effective counter-argument, and I repudiate such efforts.

Now, to be honest about myself and my limitations, I mostly ignore Economists when they talk about Linguistics in an Economics journal.  Just as they might do were I to talk about Economics in a Linguistics journal.  However, if any of my readers feels strongly enough to want to see the article retracted, here is my advice:  It is much better to simply argue against the ideas, preferably using better statistical models, and write a great article while doing so.  And if you do it well enough, you’ll really help your own career as well. 

If your reanalysis shows Feldmann is thoroughly wrong, say so, and say it as forcefully as you want. But, be prepared to end up possibly agreeing with some of what Feldmann had to say. This outcome is possible as you don’t really know what a thorough analysis would show in advance of running the data.  And if you think you can know in advance with certainty (rather than just strongly suspect) you might need to improve your scientific acumen.