Alert button

"speech": models, code, and papers
Alert button

SPRING-INX: A Multilingual Indian Language Speech Corpus by SPRING Lab, IIT Madras

Oct 24, 2023
Nithya R, Malavika S, Jordan F, Arjun Gangwar, Metilda N J, S Umesh, Rithik Sarab, Akhilesh Kumar Dubey, Govind Divakaran, Samudra Vijaya K, Suryakanth V Gangashetty

Viaarxiv icon

Subspace Hybrid MVDR Beamforming for Augmented Hearing

Nov 30, 2023
Sina Hafezi, Alastair H. Moore, Pierre H. Guiraud, Patrick A. Naylor, Jacob Donley, Vladimir Tourbabin, Thomas Lunner

Viaarxiv icon

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

Sep 27, 2023
Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang

Figure 1 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 2 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 3 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 4 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Viaarxiv icon

Modular Customizable ROS-Based Framework for Rapid Development of Social Robots

Nov 27, 2023
Mahta Akhyani, Hadi Moradi

Viaarxiv icon

FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

Dec 13, 2023
Shivangi Aneja, Justus Thies, Angela Dai, Matthias Nießner

Viaarxiv icon

A Review of Hybrid and Ensemble in Deep Learning for Natural Language Processing

Dec 09, 2023
Jianguo Jia, Wen Liang, Youzhi Liang

Viaarxiv icon

GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition

Nov 08, 2023
Daniel Galvez, Tim Kaldewey

Figure 1 for GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition
Figure 2 for GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition
Figure 3 for GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition
Figure 4 for GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition
Viaarxiv icon

Learning Co-Speech Gesture for Multimodal Aphasia Type Detection

Oct 18, 2023
Daeun Lee, Sejung Son, Hyolim Jeon, Seungbae Kim, Jinyoung Han

Viaarxiv icon

Unified speech and gesture synthesis using flow matching

Oct 08, 2023
Shivam Mehta, Ruibo Tu, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter

Viaarxiv icon

APNet2: High-quality and High-efficiency Neural Vocoder with Direct Prediction of Amplitude and Phase Spectra

Nov 20, 2023
Hui-Peng Du, Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

Viaarxiv icon