Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pierre-André Vuissoz

IADI

Acoustic-to-articulatory Inversion of the Complete Vocal Tract from RT-MRI with Various Audio Embeddings and Dataset Sizes

Mar 30, 2026

Sofiane Azzouz, Pierre-André Vuissoz, Yves Laprie

Abstract:Articulatory-to-acoustic inversion strongly depends on the type of data used. While most previous studies rely on EMA, which is limited by the number of sensors and restricted to accessible articulators, we propose an approach aiming at a complete inversion of the vocal tract, from the glottis to the lips. To this end, we used approximately 3.5 hours of RT-MRI data from a single speaker. The innovation of our approach lies in the use of articulator contours automatically extracted from MRI images, rather than relying on the raw images themselves. By focusing on these contours, the model prioritizes the essential geometric dynamics of the vocal tract while discarding redundant pixel-level information. These contours, alongside denoised audio, were then processed using a Bi-LSTM architecture. Two experiments were conducted: (1) the analysis of the impact of the audio embedding, for which three types of embeddings were evaluated as input to the model (MFCCs, LCCs, and HuBERT), and (2) the study of the influence of the dataset size, which we varied from 10 minutes to 3.5 hours. Evaluation was performed on the test data using RMSE, median error, as well as Tract Variables, to which we added an additional measurement: the larynx height. The average RMSE obtained is 1.48\,mm, compared with the pixel size (1.62\,mm). These results confirm the feasibility of a complete vocal-tract inversion using RT-MRI data.

Via

Access Paper or Ask Questions

Acoustic-to-Articulatory Inversion of Clean Speech Using an MRI-Trained Model

Mar 12, 2026

Sofiane Azzouz, Pierre-André Vuissoz, Yves Laprie

Abstract:Articulatory acoustic inversion reconstructs vocal tract shapes from speech. Real-time magnetic resonance imaging (rt-MRI) allows simultaneous acquisition of both the acoustic speech signal and articulatory information. Besides the complexity of rt-MRI acquisition, the recorded audio is heavily corrupted by scanner noise and requires denoising to be usable. For practical use, it must be possible to invert speech recorded without MRI noise. In this study, we investigate the use of speech recorded in a clean acoustic environment as an alternative to denoised MRI speech. To this end we compare two signals from the same speaker with identical sentences which are aligned using phonetic segmentation. A model trained on denoised MRI speech is evaluated on both denoised MRI and clean speech. We also assess a model trained and tested only on clean speech. Results show that clean speech supports articulatory inversion effectively, achieving an RMSE of 1.56 mm, close to MRI-based performance.

Via

Access Paper or Ask Questions

Reconstruction of the Vocal Tract from Speech via Phonetic Representations Using MRI Data

Mar 12, 2026

Sofiane Azzouz, Pierre-André Vuissoz, Yves Laprie

Abstract:Articulatory acoustic inversion aims to reconstruct the complete geometry of the vocal tract from the speech signal. In this paper, we present a comparative study of several levels of phonetic segmentation accuracy, together with a comparison to the baseline introduced in our previous work, which is based on Mel-Frequency Cepstral Coefficients (MFCCs). All the approaches considered are based on a denoised speech signal and aim to investigate the impact of incorporating phonetic information through three successive levels: an uncorrected automatic transcription, a temporally aligned phonetic segmentation, and an expert manual correction following alignment. The models are trained to predict articulatory contours extracted from vocal tract MRI images using an automatic contour tracking method. The results show that, among the models relying on phonetic representations, manual correction after alignment yields the best performance, approaching that of the baseline.

Via

Access Paper or Ask Questions

Automatic Tongue Delineation from MRI Images with a Convolutional Neural Network Approach

Dec 06, 2024

Karyna Isaieva, Yves Laprie, Nicolas Turpault, Alexis Houssard, Jacques Felblinger, Pierre-André Vuissoz

Abstract:Tongue contour extraction from real-time magnetic resonance images is a nontrivial task due to the presence of artifacts manifesting in form of blurring or ghostly contours. In this work, we present results of automatic tongue delineation achieved by means of U-Net auto-encoder convolutional neural network. We present both intra- and inter-subject validation. We used real-time magnetic resonance images and manually annotated 1-pixel wide contours as inputs. Predicted probability maps were post-processed in order to obtain 1-pixel wide tongue contours. The results are very good and slightly outperform published results on automatic tongue segmentation.

* Applied Artificial Intelligence, 2020, 34 (14), pp.1115-1123

Via

Access Paper or Ask Questions

Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data

Nov 04, 2024

Sofiane Azzouz, Pierre-André Vuissoz, Yves Laprie

Figure 1 for Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data

Figure 2 for Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data

Figure 3 for Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data

Figure 4 for Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data

Abstract:Acoustic articulatory inversion is a major processing challenge, with a wide range of applications from speech synthesis to feedback systems for language learning and rehabilitation. In recent years, deep learning methods have been applied to the inversion of less than a dozen geometrical positions corresponding to sensors glued to easily accessible articulators. It is therefore impossible to know the shape of the whole tongue from root to tip. In this work, we use high-quality real-time MRI data to track the contour of the tongue. The data used to drive the inversion are therefore the unstructured speech signal and the tongue contours. Several architectures relying on a Bi-MSTM including or not an autoencoder to reduce the dimensionality of the latent space, using or not the phonetic segmentation have been explored. The results show that the tongue contour can be recovered with a median accuracy of 2.21 mm (or 1.37 pixel) taking a context of 1 MFCC frame (static, delta and double-delta cepstral features).

Via

Access Paper or Ask Questions

Extraction of 3D trajectories of mandibular condyles from 2D real-time MRI

Jun 21, 2024

Karyna Isaieva, Justine Leclère, Guillaume Paillart, Guillaume Drouot, Jacques Felblinger, Xavier Dubernard, Pierre-André Vuissoz

Figure 1 for Extraction of 3D trajectories of mandibular condyles from 2D real-time MRI

Figure 2 for Extraction of 3D trajectories of mandibular condyles from 2D real-time MRI

Figure 3 for Extraction of 3D trajectories of mandibular condyles from 2D real-time MRI

Figure 4 for Extraction of 3D trajectories of mandibular condyles from 2D real-time MRI

Abstract:Computing the trajectories of mandibular condyles directly from MRI could provide a comprehensive examination, allowing for the extraction of both anatomical and kinematic details. This study aimed to investigate the feasibility of extracting 3D condylar trajectories from 2D real-time MRI and to assess their precision.Twenty healthy subjects underwent real-time MRI while opening and closing their jaws. One axial and two sagittal slices were segmented using a U-Net-based algorithm. The centers of mass of the resulting masks were projected onto the coordinate system based on anatomical markers and temporally adjusted using a common projection. The quality of the computed trajectories was evaluated using metrics designed to estimate movement reproducibility, head motion, and slice placement symmetry.The segmentation of the axial slices demonstrated good-to-excellent quality; however, the segmentation of the sagittal slices required some fine-tuning. The movement reproducibility was acceptable for most cases; nevertheless, head motion displaced the trajectories by 1 mm on average. The difference in the superior-inferior coordinate of the condyles in the closed jaw position was 1.7 mm on average.Despite limitations in precision, real-time MRI enables the extraction of condylar trajectories with sufficient accuracy for evaluating clinically relevant parameters such as condyle displacement, trajectories aspect, and symmetry.

Via

Access Paper or Ask Questions