Alert button
Picture for Shiyu Zhou

Shiyu Zhou

Alert button

Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

Add code
Bookmark button
Alert button
Nov 17, 2022
Zhiyun Fan, Zhenlin Liang, Linhao Dong, Yi Liu, Shiyu Zhou, Meng Cai, Jun Zhang, Zejun Ma, Bo Xu

Figure 1 for Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Figure 2 for Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Figure 3 for Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Figure 4 for Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Viaarxiv icon

Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection

Add code
Bookmark button
Alert button
Jan 30, 2022
Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu

Figure 1 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Figure 2 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Figure 3 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Figure 4 for Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
Viaarxiv icon

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

Add code
Bookmark button
Alert button
Jul 06, 2021
Jing Liu, Xinxin Zhu, Fei Liu, Longteng Guo, Zijia Zhao, Mingzhen Sun, Weining Wang, Hanqing Lu, Shiyu Zhou, Jiajun Zhang, Jinqiao Wang

Figure 1 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Figure 2 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Figure 3 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Figure 4 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Viaarxiv icon

Long-Running Speech Recognizer:An End-to-End Multi-Task Learning Framework for Online ASR and VAD

Add code
Bookmark button
Alert button
Mar 02, 2021
Meng Li, Shiyu Zhou, Bo Xu

Figure 1 for Long-Running Speech Recognizer:An End-to-End Multi-Task Learning Framework for Online ASR and VAD
Figure 2 for Long-Running Speech Recognizer:An End-to-End Multi-Task Learning Framework for Online ASR and VAD
Figure 3 for Long-Running Speech Recognizer:An End-to-End Multi-Task Learning Framework for Online ASR and VAD
Figure 4 for Long-Running Speech Recognizer:An End-to-End Multi-Task Learning Framework for Online ASR and VAD
Viaarxiv icon

Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition

Add code
Bookmark button
Alert button
Jan 24, 2021
Cheng Yi, Shiyu Zhou, Bo Xu

Figure 1 for Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Figure 2 for Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Figure 3 for Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Figure 4 for Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Viaarxiv icon

Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages

Add code
Bookmark button
Alert button
Jan 17, 2021
Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, Bo Xu

Figure 1 for Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages
Figure 2 for Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages
Figure 3 for Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages
Figure 4 for Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages
Viaarxiv icon

Fusing Wav2vec2.0 and BERT into End-to-end Model for Low-resource Speech Recognition

Add code
Bookmark button
Alert button
Jan 17, 2021
Cheng Yi, Shiyu Zhou, Bo Xu

Figure 1 for Fusing Wav2vec2.0 and BERT into End-to-end Model for Low-resource Speech Recognition
Figure 2 for Fusing Wav2vec2.0 and BERT into End-to-end Model for Low-resource Speech Recognition
Figure 3 for Fusing Wav2vec2.0 and BERT into End-to-end Model for Low-resource Speech Recognition
Figure 4 for Fusing Wav2vec2.0 and BERT into End-to-end Model for Low-resource Speech Recognition
Viaarxiv icon

Exploring wav2vec 2.0 on speaker verification and language identification

Add code
Bookmark button
Alert button
Jan 14, 2021
Zhiyun Fan, Meng Li, Shiyu Zhou, Bo Xu

Figure 1 for Exploring wav2vec 2.0 on speaker verification and language identification
Figure 2 for Exploring wav2vec 2.0 on speaker verification and language identification
Figure 3 for Exploring wav2vec 2.0 on speaker verification and language identification
Figure 4 for Exploring wav2vec 2.0 on speaker verification and language identification
Viaarxiv icon

Applying wav2vec2.0 to Speech Recognition in various low-resource languages

Add code
Bookmark button
Alert button
Dec 22, 2020
Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, Bo Xu

Figure 1 for Applying wav2vec2.0 to Speech Recognition in various low-resource languages
Figure 2 for Applying wav2vec2.0 to Speech Recognition in various low-resource languages
Figure 3 for Applying wav2vec2.0 to Speech Recognition in various low-resource languages
Figure 4 for Applying wav2vec2.0 to Speech Recognition in various low-resource languages
Viaarxiv icon

cif-based collaborative decoding for end-to-end contextual speech recognition

Add code
Bookmark button
Alert button
Dec 17, 2020
Minglun Han, Linhao Dong, Shiyu Zhou, Bo Xu

Figure 1 for cif-based collaborative decoding for end-to-end contextual speech recognition
Figure 2 for cif-based collaborative decoding for end-to-end contextual speech recognition
Figure 3 for cif-based collaborative decoding for end-to-end contextual speech recognition
Figure 4 for cif-based collaborative decoding for end-to-end contextual speech recognition
Viaarxiv icon