Multi-hypothesis tracking of the tongue surface in ultrasound video recordings of normal and impaired speech
Characterizing tongue shape and motion, as they appear in real-time ultrasound (US) images, is of interest to the study of healthy and impaired speech production. Quantitative anlaysis of tongue shape and motion requires that the tongue surface be extracted in each frame of US speech recordings. While the literature proposes several automated methods for this purpose, these either require large or very well matched training sets, or lack robustness in the presence of rapid tongue motion. This paper presents a new robust method for tongue tracking in US images that combines simple tongue shape and motion models derived from a small training data set with a highly flexible active contour (snake) representation and maintains multiple possible hypotheses as to the correct tongue contour via a particle filtering algorithm. The method was tested on a database of large free speech recordings from healthy and impaired speakers and its accuracy was measured against the manual segmentations obtained for every image in the database. The proposed method achieved mean sum of distances errors of 1.69 ± 1.10 mm, and its accuracy was not highly sensitive to training set composition. Furthermore, the proposed method showed improved accuracy, both in terms of mean sum of distances error and in terms of linguistically meaningful shape indices, compared to the three publicly available tongue tracking software packages Edgetrak, TongueTrack and Autotrace.