Alert button

"speech": models, code, and papers
Alert button

Cosine Scoring with Uncertainty for Neural Speaker Embedding

Mar 11, 2024
Qiongqiong Wang, Kong Aik Lee

Viaarxiv icon

Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach

Mar 11, 2024
Amit Eliav, Sharon Gannot

Viaarxiv icon

Prosody for Intuitive Robotic Interface Design: It's Not What You Said, It's How You Said It

Mar 13, 2024
Elaheh Sanoubari, Atil Iscen, Leila Takayama, Stefano Saliceti, Corbin Cunningham, Ken Caluwaerts

Viaarxiv icon

Structured Tree Alignment for Evaluation of (Speech) Constituency Parsing

Feb 21, 2024
Freda Shi, Kevin Gimpel, Karen Livescu

Viaarxiv icon

SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning

Feb 26, 2024
Luca Zampierin, Ghouthi Boukli Hacene, Bac Nguyen, Mirco Ravanelli

Viaarxiv icon

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

Feb 25, 2024
Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro

Viaarxiv icon

Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation

Mar 14, 2024
Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

Viaarxiv icon

Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations

Feb 10, 2024
Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain

Viaarxiv icon

An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data

Mar 12, 2024
Yudong Yang, Rongfeng Su, Xiaokang Liu, Nan Yan, Lan Wang

Viaarxiv icon

Pushing the Limits of Zero-shot End-to-End Speech Translation

Feb 16, 2024
Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà

Viaarxiv icon