Alert button
Picture for Siqi Zheng

Siqi Zheng

Alert button

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

Add code
Bookmark button
Alert button
Mar 29, 2024
Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Changhe Song, Rongjie Huang, Ziyang Ma, Qian Chen, Shiliang Zhang, Xihao Li

Figure 1 for 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Figure 2 for 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Figure 3 for 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Figure 4 for 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Viaarxiv icon

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

Add code
Bookmark button
Alert button
Feb 13, 2024
Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

Viaarxiv icon

Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR

Add code
Bookmark button
Alert button
Nov 08, 2023
Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang

Figure 1 for Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR
Figure 2 for Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR
Figure 3 for Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR
Figure 4 for Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token Based ASR
Viaarxiv icon

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

Add code
Bookmark button
Alert button
Oct 11, 2023
Jiaming Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

Figure 1 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 2 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 3 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 4 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Viaarxiv icon

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

Add code
Bookmark button
Alert button
Sep 19, 2023
Luyao Cheng, Siqi Zheng, Qinglin Zhang, Hui Wang, Yafeng Chen, Qian Chen, Shiliang Zhang

Figure 1 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 2 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 3 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 4 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Viaarxiv icon

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

Add code
Bookmark button
Alert button
Sep 14, 2023
Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng

Figure 1 for FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Figure 2 for FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Figure 3 for FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Figure 4 for FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Viaarxiv icon

Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision

Add code
Bookmark button
Alert button
Aug 20, 2023
Yafeng Chen, Siqi Zheng, Qian Chen

Figure 1 for Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision
Figure 2 for Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision
Figure 3 for Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision
Figure 4 for Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision
Viaarxiv icon

Ensemble Distillation Network: Learning Robust Speaker Representations without Supervision

Add code
Bookmark button
Alert button
Aug 05, 2023
Yafeng Chen, Siqi Zheng, Qian Chen

Figure 1 for Ensemble Distillation Network: Learning Robust Speaker Representations without Supervision
Figure 2 for Ensemble Distillation Network: Learning Robust Speaker Representations without Supervision
Figure 3 for Ensemble Distillation Network: Learning Robust Speaker Representations without Supervision
Figure 4 for Ensemble Distillation Network: Learning Robust Speaker Representations without Supervision
Viaarxiv icon

Improving BERT with Hybrid Pooling Network and Drop Mask

Add code
Bookmark button
Alert button
Jul 14, 2023
Qian Chen, Wen Wang, Qinglin Zhang, Chong Deng, Ma Yukun, Siqi Zheng

Figure 1 for Improving BERT with Hybrid Pooling Network and Drop Mask
Figure 2 for Improving BERT with Hybrid Pooling Network and Drop Mask
Figure 3 for Improving BERT with Hybrid Pooling Network and Drop Mask
Figure 4 for Improving BERT with Hybrid Pooling Network and Drop Mask
Viaarxiv icon

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

Add code
Bookmark button
Alert button
Jun 28, 2023
Siqi Zheng, Luyao Cheng, Yafeng Chen, Hui Wang, Qian Chen

Figure 1 for 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Figure 2 for 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Figure 3 for 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Figure 4 for 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Viaarxiv icon