ETIS, A*STAR, IPAL
Abstract:Understanding how structured sequence information can be represented and generalized in neural systems is key to modeling the transition from acoustic input to emergent structure. In this study, we propose a rank-order based neural network inspired by the STG-LIFG-PMC pathway, modeling the bottom-up transition from acoustic input to abstract rank representation, and the top-down generation from that representation to motor execution. Building on previous work in rank coding, we first demonstrate that this model efficiently compresses input while retaining the capacity to reconstruct full utterances from partial cues, revealing emergent structure-sensitive generation process that reflects context-general representations of sensorimotor states, which are later shaped into context-specific motor plans during speech planning. We then show that the network exhibits global-level novelty detection similar to the P3B novelty wave, replicating the global-sequence-sensitive mechanism. As a supplement, we also compare the model's behavior under local (index-level) and global (rank-level) perturbations, revealing robustness to superficial variation and sensitivity to abstract structural violation, key features associated with proto-syntactic generalization. These results suggest that rank-order coding not only serve as a compact encoding scheme but also support encoding hierarchical grammar.




Abstract:Voiced Electromyography (EMG)-to-Speech (V-ETS) models reconstruct speech from muscle activity signals, facilitating applications such as neurolaryngologic diagnostics. Despite its potential, the advancement of V-ETS is hindered by a scarcity of paired EMG-speech data. To address this, we propose a novel Confidence-based Multi-Speaker Self-training (CoM2S) approach, along with a newly curated Libri-EMG dataset. This approach leverages synthetic EMG data generated by a pre-trained model, followed by a proposed filtering mechanism based on phoneme-level confidence to enhance the ETS model through the proposed self-training techniques. Experiments demonstrate our method improves phoneme accuracy, reduces phonological confusion, and lowers word error rate, confirming the effectiveness of our CoM2S approach for V-ETS. In support of future research, we will release the codes and the proposed Libri-EMG dataset-an open-access, time-aligned, multi-speaker voiced EMG and speech recordings.
Abstract:Understanding how infants perceive speech sounds and language structures is still an open problem. Previous research in artificial neural networks has mainly focused on large dataset-dependent generative models, aiming to replicate language-related phenomena such as ''perceptual narrowing''. In this paper, we propose a novel approach using a small-sized generative neural network equipped with a continual learning mechanism based on predictive coding for mono-and bilingual speech sound learning (referred to as language sound acquisition during ''critical period'') and a compositional optimization mechanism for generation where no learning is involved (later infancy sound imitation). Our model prioritizes interpretability and demonstrates the advantages of online learning: Unlike deep networks requiring substantial offline training, our model continuously updates with new data, making it adaptable and responsive to changing inputs. Through experiments, we demonstrate that if second language acquisition occurs during later infancy, the challenges associated with learning a foreign language after the critical period amplify, replicating the perceptual narrowing effect.