Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

D. Muhammad Noorul Mubarak

Tracking Articulatory Dynamics in Speech with a Fixed-Weight BiLSTM-CNN Architecture

Apr 25, 2025

Leena G Pillai, D. Muhammad Noorul Mubarak, Elizabeth Sherly

Abstract:Speech production is a complex sequential process which involve the coordination of various articulatory features. Among them tongue being a highly versatile active articulator responsible for shaping airflow to produce targeted speech sounds that are intellectual, clear, and distinct. This paper presents a novel approach for predicting tongue and lip articulatory features involved in a given speech acoustics using a stacked Bidirectional Long Short-Term Memory (BiLSTM) architecture, combined with a one-dimensional Convolutional Neural Network (CNN) for post-processing with fixed weights initialization. The proposed network is trained with two datasets consisting of simultaneously recorded speech and Electromagnetic Articulography (EMA) datasets, each introducing variations in terms of geographical origin, linguistic characteristics, phonetic diversity, and recording equipment. The performance of the model is assessed in Speaker Dependent (SD), Speaker Independent (SI), corpus dependent (CD) and cross corpus (CC) modes. Experimental results indicate that the proposed model with fixed weights approach outperformed the adaptive weights initialization with in relatively minimal number of training epochs. These findings contribute to the development of robust and efficient models for articulatory feature prediction, paving the way for advancements in speech production research and applications.

* 10 pages with 8 figures. This paper presented in an international Conference

Via

Access Paper or Ask Questions

Acoustic to Articulatory Inversion of Speech; Data Driven Approaches, Challenges, Applications, and Future Scope

Apr 17, 2025

Leena G Pillai, D. Muhammad Noorul Mubarak

Figure 1 for Acoustic to Articulatory Inversion of Speech; Data Driven Approaches, Challenges, Applications, and Future Scope

Figure 2 for Acoustic to Articulatory Inversion of Speech; Data Driven Approaches, Challenges, Applications, and Future Scope

Figure 3 for Acoustic to Articulatory Inversion of Speech; Data Driven Approaches, Challenges, Applications, and Future Scope

Figure 4 for Acoustic to Articulatory Inversion of Speech; Data Driven Approaches, Challenges, Applications, and Future Scope

Abstract:This review is focused on the data-driven approaches applied in different applications of Acoustic-to-Articulatory Inversion (AAI) of speech. This review paper considered the relevant works published in the last ten years (2011-2021). The selection criteria includes (a) type of AAI - Speaker Dependent and Speaker Independent AAI, (b) objectives of the work - Articulatory approximation, Articulatory Feature space selection and Automatic Speech Recognition (ASR), explore the correlation between acoustic and articulatory features, and framework for Computer-assisted language training, (c) Corpus - Simultaneously recorded speech (wav) and medical imaging models such as ElectroMagnetic Articulography (EMA), Electropalatography (EPG), Laryngography, Electroglottography (EGG), X-ray Cineradiography, Ultrasound, and real-time Magnetic Resonance Imaging (rtMRI), (d) Methods or models - recent works are considered, and therefore all the works are based on machine learning, (e) Evaluation - as AAI is a non-linear regression problem, the performance evaluation is mostly done by Correlation Coefficient (CC), Root Mean Square Error (RMSE), and also considered Mean Square Error (MSE), and Mean Format Error (MFE). The practical application of the AAI model can provide a better and user-friendly interpretable image feedback system of articulatory positions, especially tongue movement. Such trajectory feedback system can be used to provide phonetic, language, and speech therapy for pathological subjects.

* This is a review paper about Acoustic to Articulatory inversion of speech, presented in an international conference. This paper has 8 pages and 2 figures

Via

Access Paper or Ask Questions