Alert button
Picture for Shiliang Zhang

Shiliang Zhang

Alert button

SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus

Add code
Bookmark button
Alert button
Sep 12, 2023
Haoxu Wang, Fan Yu, Xian Shi, Yuezhang Wang, Shiliang Zhang, Ming Li

Figure 1 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Figure 2 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Figure 3 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Figure 4 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Viaarxiv icon

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

Add code
Bookmark button
Alert button
Aug 16, 2023
Xian Shi, Yexin Yang, Zerui Li, Shiliang Zhang

Figure 1 for SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability
Figure 2 for SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability
Figure 3 for SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability
Figure 4 for SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability
Viaarxiv icon

MixBCT: Towards Self-Adapting Backward-Compatible Training

Add code
Bookmark button
Alert button
Aug 14, 2023
Yu Liang, Shiliang Zhang, Yaowei Wang, Sheng Xiao, Kenli Li, Xiaoyu Wang

Figure 1 for MixBCT: Towards Self-Adapting Backward-Compatible Training
Figure 2 for MixBCT: Towards Self-Adapting Backward-Compatible Training
Figure 3 for MixBCT: Towards Self-Adapting Backward-Compatible Training
Figure 4 for MixBCT: Towards Self-Adapting Backward-Compatible Training
Viaarxiv icon

Rethinking the visual cues in audio-visual speaker extraction

Add code
Bookmark button
Alert button
Jun 05, 2023
Junjie Li, Meng Ge, Zexu pan, Rui Cao, Longbiao Wang, Jianwu Dang, Shiliang Zhang

Figure 1 for Rethinking the visual cues in audio-visual speaker extraction
Figure 2 for Rethinking the visual cues in audio-visual speaker extraction
Figure 3 for Rethinking the visual cues in audio-visual speaker extraction
Viaarxiv icon

speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

Add code
Bookmark button
Alert button
May 30, 2023
Haoyu Lu, Nan Li, Tongtong Song, Longbiao Wang, Jianwu Dang, Xiaobao Wang, Shiliang Zhang

Figure 1 for speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Figure 2 for speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Figure 3 for speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Figure 4 for speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Viaarxiv icon

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

Add code
Bookmark button
Alert button
May 25, 2023
Xian Shi, Haoneng Luo, Zhifu Gao, Shiliang Zhang, Zhijie Yan

Figure 1 for Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Figure 2 for Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Figure 3 for Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Figure 4 for Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Viaarxiv icon

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

Add code
Bookmark button
Alert button
May 23, 2023
Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie

Figure 1 for BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
Figure 2 for BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
Figure 3 for BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
Figure 4 for BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
Viaarxiv icon

CASA-ASR: Context-Aware Speaker-Attributed ASR

Add code
Bookmark button
Alert button
May 21, 2023
Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai

Figure 1 for CASA-ASR: Context-Aware Speaker-Attributed ASR
Figure 2 for CASA-ASR: Context-Aware Speaker-Attributed ASR
Figure 3 for CASA-ASR: Context-Aware Speaker-Attributed ASR
Figure 4 for CASA-ASR: Context-Aware Speaker-Attributed ASR
Viaarxiv icon

Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction

Add code
Bookmark button
Alert button
May 21, 2023
Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang, Li-Rong Dai

Figure 1 for Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction
Figure 2 for Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction
Figure 3 for Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction
Figure 4 for Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction
Viaarxiv icon