Alert button
Picture for Yuxuan Wang

Yuxuan Wang

Alert button

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Add code
Bookmark button
Alert button
Aug 10, 2023
Haohe Liu, Qiao Tian, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley

Figure 1 for AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Figure 2 for AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Figure 3 for AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Figure 4 for AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Viaarxiv icon

Separate Anything You Describe

Add code
Bookmark button
Alert button
Aug 09, 2023
Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang

Figure 1 for Separate Anything You Describe
Figure 2 for Separate Anything You Describe
Figure 3 for Separate Anything You Describe
Figure 4 for Separate Anything You Describe
Viaarxiv icon

MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning

Add code
Bookmark button
Alert button
Jun 14, 2023
Jianghui Wang, Yuxuan Wang, Dongyan Zhao, Zilong Zheng

Figure 1 for MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Figure 2 for MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Figure 3 for MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Figure 4 for MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Viaarxiv icon

PolyVoice: Language Models for Speech to Speech Translation

Add code
Bookmark button
Alert button
Jun 13, 2023
Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang

Figure 1 for PolyVoice: Language Models for Speech to Speech Translation
Figure 2 for PolyVoice: Language Models for Speech to Speech Translation
Figure 3 for PolyVoice: Language Models for Speech to Speech Translation
Figure 4 for PolyVoice: Language Models for Speech to Speech Translation
Viaarxiv icon

Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency

Add code
Bookmark button
Alert button
Jun 05, 2023
Yuxuan Wang, Hong Lyu

Figure 1 for Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency
Figure 2 for Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency
Figure 3 for Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency
Figure 4 for Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency
Viaarxiv icon

Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training

Add code
Bookmark button
Alert button
May 30, 2023
Yuxuan Wang, Jianghui Wang, Dongyan Zhao, Zilong Zheng

Figure 1 for Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training
Figure 2 for Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training
Figure 3 for Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training
Figure 4 for Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training
Viaarxiv icon

VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions

Add code
Bookmark button
Alert button
May 30, 2023
Yuxuan Wang, Zilong Zheng, Xueliang Zhao, Jinpeng Li, Yueqian Wang, Dongyan Zhao

Figure 1 for VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
Figure 2 for VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
Figure 3 for VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
Figure 4 for VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
Viaarxiv icon

Efficient Neural Music Generation

Add code
Bookmark button
Alert button
May 25, 2023
Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang

Figure 1 for Efficient Neural Music Generation
Figure 2 for Efficient Neural Music Generation
Figure 3 for Efficient Neural Music Generation
Figure 4 for Efficient Neural Music Generation
Viaarxiv icon

Language-universal phonetic encoder for low-resource speech recognition

Add code
Bookmark button
Alert button
May 19, 2023
Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

Figure 1 for Language-universal phonetic encoder for low-resource speech recognition
Figure 2 for Language-universal phonetic encoder for low-resource speech recognition
Figure 3 for Language-universal phonetic encoder for low-resource speech recognition
Figure 4 for Language-universal phonetic encoder for low-resource speech recognition
Viaarxiv icon