Alert button
Picture for Zejun Ma

Zejun Ma

Alert button

SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR

Add code
Bookmark button
Alert button
Mar 04, 2024
Zhiyun Fan, Linhao Dong, Jun Zhang, Lu Lu, Zejun Ma

Figure 1 for SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Figure 2 for SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Figure 3 for SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Figure 4 for SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Viaarxiv icon

SLIT: Boosting Audio-Text Pre-Training via Multi-Stage Learning and Instruction Tuning

Add code
Bookmark button
Alert button
Feb 20, 2024
Hang Zhao, Yifei Xin, Zhesong Yu, Bilei Zhu, Lu Lu, Zejun Ma

Viaarxiv icon

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

Add code
Bookmark button
Alert button
Jan 20, 2024
Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiawei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao

Viaarxiv icon

Improving Large-scale Deep Biasing with Phoneme Features and Text-only Data in Streaming Transducer

Add code
Bookmark button
Alert button
Nov 15, 2023
Jin Qiu, Lu Huang, Boyu Li, Jun Zhang, Lu Lu, Zejun Ma

Viaarxiv icon

SALMONN: Towards Generic Hearing Abilities for Large Language Models

Add code
Bookmark button
Alert button
Oct 20, 2023
Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Viaarxiv icon

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

Add code
Bookmark button
Alert button
Oct 10, 2023
Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Viaarxiv icon

Connecting Speech Encoder and Large Language Model for ASR

Add code
Bookmark button
Alert button
Sep 26, 2023
Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Viaarxiv icon

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts

Add code
Bookmark button
Alert button
Jul 14, 2023
Ziyue Jiang, Jinglin Liu, Yi Ren, Jinzheng He, Chen Zhang, Zhenhui Ye, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao

Figure 1 for Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts
Figure 2 for Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts
Figure 3 for Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts
Figure 4 for Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts
Viaarxiv icon

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech

Add code
Bookmark button
Alert button
Jun 27, 2023
Yahuan Cong, Haoyu Zhang, Haopeng Lin, Shichao Liu, Chunfeng Wang, Yi Ren, Xiang Yin, Zejun Ma

Figure 1 for GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech
Figure 2 for GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech
Figure 3 for GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech
Figure 4 for GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech
Viaarxiv icon