Alert button

"speech": models, code, and papers
Alert button

Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling

Jan 22, 2024
Bruno Korbar, Jaesung Huh, Andrew Zisserman

Viaarxiv icon

On Robustness to Missing Video for Audiovisual Speech Recognition

Dec 19, 2023
Oscar Chang, Otavio Braga, Hank Liao, Dmitriy Serdyuk, Olivier Siohan

Figure 1 for On Robustness to Missing Video for Audiovisual Speech Recognition
Figure 2 for On Robustness to Missing Video for Audiovisual Speech Recognition
Figure 3 for On Robustness to Missing Video for Audiovisual Speech Recognition
Figure 4 for On Robustness to Missing Video for Audiovisual Speech Recognition
Viaarxiv icon

Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings

Jan 29, 2024
He Zhao, Hangting Chen, Jianwei Yu, Yuehai Wang

Viaarxiv icon

Acoustic Local Positioning With Encoded Emission Beacons

Feb 04, 2024
Jesus Urena, Alvaro Hernandez, Juan Jesus Garcia, Jose Manuel Villadangos, Maria del Carmen Perez, David Gualda, Fernando J. Alvarez, Teodoro Aguilera

Figure 1 for Acoustic Local Positioning With Encoded Emission Beacons
Figure 2 for Acoustic Local Positioning With Encoded Emission Beacons
Figure 3 for Acoustic Local Positioning With Encoded Emission Beacons
Figure 4 for Acoustic Local Positioning With Encoded Emission Beacons
Viaarxiv icon

Consistency Based Unsupervised Self-training For ASR Personalisation

Jan 22, 2024
Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung

Viaarxiv icon

Seq2seq for Automatic Paraphasia Detection in Aphasic Speech

Dec 16, 2023
Matthew Perez, Duc Le, Amrit Romana, Elise Jones, Keli Licata, Emily Mower Provost

Viaarxiv icon

MCMChaos: Improvising Rap Music with MCMC Methods and Chaos Theory

Jan 15, 2024
Robert G. Kimelman

Viaarxiv icon

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters

Add code
Bookmark button
Alert button
Feb 01, 2024
Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti

Viaarxiv icon

Efficient Training Spiking Neural Networks with Parallel Spiking Unit

Feb 01, 2024
Yang Li, Yinqian Sun, Xiang He, Yiting Dong, Dongcheng Zhao, Yi Zeng

Viaarxiv icon

PAM: Prompting Audio-Language Models for Audio Quality Assessment

Add code
Bookmark button
Alert button
Feb 01, 2024
Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang

Viaarxiv icon