Alert button
Picture for Xize Cheng

Xize Cheng

Alert button

Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment

Add code
Bookmark button
Alert button
Apr 16, 2024
Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, Ruiqi Li, Fuming You, Zhou Zhao, Zhimeng Zhang

Viaarxiv icon

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

Add code
Bookmark button
Alert button
Dec 23, 2023
Xize Cheng, Rongjie Huang, Linjun Li, Tao Jin, Zehan Wang, Aoxiong Yin, Minglei Li, Xinyu Duan, changpeng yang, Zhou Zhao

Viaarxiv icon

Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers

Add code
Bookmark button
Alert button
Dec 15, 2023
Haifeng Huang, Zehan Wang, Rongjie Huang, Luping Liu, Xize Cheng, Yang Zhao, Tao Jin, Zhou Zhao

Figure 1 for Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers
Figure 2 for Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers
Figure 3 for Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers
Figure 4 for Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers
Viaarxiv icon

3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding

Add code
Bookmark button
Alert button
Jul 25, 2023
Zehan Wang, Haifeng Huang, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao

Figure 1 for 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
Figure 2 for 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
Figure 3 for 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
Figure 4 for 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
Viaarxiv icon

Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding

Add code
Bookmark button
Alert button
Jul 18, 2023
Zehan Wang, Haifeng Huang, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao

Figure 1 for Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding
Figure 2 for Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding
Figure 3 for Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding
Figure 4 for Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding
Viaarxiv icon

OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment

Add code
Bookmark button
Alert button
Jun 10, 2023
Xize Cheng, Tao Jin, Linjun Li, Wang Lin, Xinyu Duan, Zhou Zhao

Figure 1 for OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
Figure 2 for OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
Figure 3 for OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
Figure 4 for OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
Viaarxiv icon

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

Add code
Bookmark button
Alert button
May 24, 2023
Rongjie Huang, Huadai Liu, Xize Cheng, Yi Ren, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Liu, Xiang Yin, Zhou Zhao

Figure 1 for AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
Figure 2 for AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
Figure 3 for AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
Figure 4 for AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
Viaarxiv icon

Connecting Multi-modal Contrastive Representations

Add code
Bookmark button
Alert button
May 22, 2023
Zehan Wang, Yang Zhao, Xize Cheng, Haifeng Huang, Jiageng Liu, Li Tang, Linjun Li, Yongqi Wang, Aoxiong Yin, Ziang Zhang, Zhou Zhao

Figure 1 for Connecting Multi-modal Contrastive Representations
Figure 2 for Connecting Multi-modal Contrastive Representations
Figure 3 for Connecting Multi-modal Contrastive Representations
Figure 4 for Connecting Multi-modal Contrastive Representations
Viaarxiv icon

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing

Add code
Bookmark button
Alert button
May 21, 2023
Huadai Liu, Rongjie Huang, Jinzheng He, Gang Sun, Ran Shen, Xize Cheng, Zhou Zhao

Figure 1 for Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Figure 2 for Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Figure 3 for Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Figure 4 for Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Viaarxiv icon

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

Add code
Bookmark button
Alert button
Mar 09, 2023
Xize Cheng, Linjun Li, Tao Jin, Rongjie Huang, Wang Lin, Zehan Wang, Huangdai Liu, Ye Wang, Aoxiong Yin, Zhou Zhao

Figure 1 for MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Figure 2 for MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Figure 3 for MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Figure 4 for MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Viaarxiv icon