Alert button

"speech": models, code, and papers
Alert button

BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0

Dec 21, 2023
Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang

Viaarxiv icon

Revisiting the Entropy Semiring for Neural Speech Recognition

Dec 19, 2023
Oscar Chang, Dongseong Hwang, Olivier Siohan

Figure 1 for Revisiting the Entropy Semiring for Neural Speech Recognition
Figure 2 for Revisiting the Entropy Semiring for Neural Speech Recognition
Figure 3 for Revisiting the Entropy Semiring for Neural Speech Recognition
Figure 4 for Revisiting the Entropy Semiring for Neural Speech Recognition
Viaarxiv icon

Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization

Jan 23, 2024
Wei-Ping Huang, Sung-Feng Huang, Hung-yi Lee

Viaarxiv icon

Efficient Training Spiking Neural Networks with Parallel Spiking Unit

Feb 02, 2024
Yang Li, Yinqian Sun, Xiang He, Yiting Dong, Dongcheng Zhao, Yi Zeng

Viaarxiv icon

Soft Alignment of Modality Space for End-to-end Speech Translation

Dec 18, 2023
Yuhao Zhang, Kaiqi Kou, Bei Li, Chen Xu, Chunliang Zhang, Tong Xiao, Jingbo Zhu

Viaarxiv icon

Acoustic Local Positioning With Encoded Emission Beacons

Feb 04, 2024
Jesus Urena, Alvaro Hernandez, Juan Jesus Garcia, Jose Manuel Villadangos, Maria del Carmen Perez, David Gualda, Fernando J. Alvarez, Teodoro Aguilera

Figure 1 for Acoustic Local Positioning With Encoded Emission Beacons
Figure 2 for Acoustic Local Positioning With Encoded Emission Beacons
Figure 3 for Acoustic Local Positioning With Encoded Emission Beacons
Figure 4 for Acoustic Local Positioning With Encoded Emission Beacons
Viaarxiv icon

Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings

Jan 29, 2024
He Zhao, Hangting Chen, Jianwei Yu, Yuehai Wang

Viaarxiv icon

Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling

Jan 22, 2024
Bruno Korbar, Jaesung Huh, Andrew Zisserman

Viaarxiv icon

PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques

Jan 04, 2024
Tzu-Han Lin, How-Shing Wang, Hao-Yung Weng, Kuang-Chen Peng, Zih-Ching Chen, Hung-yi Lee

Viaarxiv icon

Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization

Jan 16, 2024
Ming Cheng, Ming Li

Viaarxiv icon