Alert button

"speech recognition": models, code, and papers
Alert button

Tensor decomposition for minimization of E2E SLU model toward on-device processing

Jun 02, 2023
Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe

Figure 1 for Tensor decomposition for minimization of E2E SLU model toward on-device processing
Figure 2 for Tensor decomposition for minimization of E2E SLU model toward on-device processing
Figure 3 for Tensor decomposition for minimization of E2E SLU model toward on-device processing
Figure 4 for Tensor decomposition for minimization of E2E SLU model toward on-device processing
Viaarxiv icon

Bridging the Granularity Gap for Acoustic Modeling

Add code
Bookmark button
Alert button
May 27, 2023
Chen Xu, Yuhao Zhang, Chengbo Jiao, Xiaoqian Liu, Chi Hu, Xin Zeng, Tong Xiao, Anxiang Ma, Huizhen Wang, JingBo Zhu

Figure 1 for Bridging the Granularity Gap for Acoustic Modeling
Figure 2 for Bridging the Granularity Gap for Acoustic Modeling
Figure 3 for Bridging the Granularity Gap for Acoustic Modeling
Figure 4 for Bridging the Granularity Gap for Acoustic Modeling
Viaarxiv icon

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

May 30, 2023
Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng

Figure 1 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 2 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 3 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 4 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Viaarxiv icon

Self-supervised representations in speech-based depression detection

May 20, 2023
Wen Wu, Chao Zhang, Philip C. Woodland

Figure 1 for Self-supervised representations in speech-based depression detection
Figure 2 for Self-supervised representations in speech-based depression detection
Figure 3 for Self-supervised representations in speech-based depression detection
Figure 4 for Self-supervised representations in speech-based depression detection
Viaarxiv icon

CB-Conformer: Contextual biasing Conformer for biased word recognition

Add code
Bookmark button
Alert button
Apr 19, 2023
Yaoxun Xu, Baiji Liu, Qiaochu Huang and, Xingchen Song, Zhiyong Wu, Shiyin Kang, Helen Meng

Figure 1 for CB-Conformer: Contextual biasing Conformer for biased word recognition
Figure 2 for CB-Conformer: Contextual biasing Conformer for biased word recognition
Figure 3 for CB-Conformer: Contextual biasing Conformer for biased word recognition
Figure 4 for CB-Conformer: Contextual biasing Conformer for biased word recognition
Viaarxiv icon

Large-scale unsupervised audio pre-training for video-to-speech synthesis

Add code
Bookmark button
Alert button
Jun 27, 2023
Triantafyllos Kefalas, Yannis Panagakis, Maja Pantic

Figure 1 for Large-scale unsupervised audio pre-training for video-to-speech synthesis
Figure 2 for Large-scale unsupervised audio pre-training for video-to-speech synthesis
Figure 3 for Large-scale unsupervised audio pre-training for video-to-speech synthesis
Figure 4 for Large-scale unsupervised audio pre-training for video-to-speech synthesis
Viaarxiv icon

Joint Speech Recognition and Audio Captioning

Add code
Bookmark button
Alert button
Feb 03, 2022
Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe

Figure 1 for Joint Speech Recognition and Audio Captioning
Figure 2 for Joint Speech Recognition and Audio Captioning
Figure 3 for Joint Speech Recognition and Audio Captioning
Figure 4 for Joint Speech Recognition and Audio Captioning
Viaarxiv icon

Enhancing Speech Recognition Decoding via Layer Aggregation

Add code
Bookmark button
Alert button
Apr 05, 2022
Tomer Wullach, Shlomo E. Chazan

Figure 1 for Enhancing Speech Recognition Decoding via Layer Aggregation
Figure 2 for Enhancing Speech Recognition Decoding via Layer Aggregation
Figure 3 for Enhancing Speech Recognition Decoding via Layer Aggregation
Figure 4 for Enhancing Speech Recognition Decoding via Layer Aggregation
Viaarxiv icon

Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings

Add code
Bookmark button
Alert button
Jun 30, 2023
Ilyass Hammouamri, Ismail Khalfaoui-Hassani, Timothée Masquelier

Figure 1 for Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings
Figure 2 for Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings
Figure 3 for Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings
Figure 4 for Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings
Viaarxiv icon

Encoder-decoder multimodal speaker change detection

Jun 01, 2023
Jee-weon Jung, Soonshin Seo, Hee-Soo Heo, Geonmin Kim, You Jin Kim, Young-ki Kwon, Minjae Lee, Bong-Jin Lee

Figure 1 for Encoder-decoder multimodal speaker change detection
Figure 2 for Encoder-decoder multimodal speaker change detection
Figure 3 for Encoder-decoder multimodal speaker change detection
Figure 4 for Encoder-decoder multimodal speaker change detection
Viaarxiv icon