Alert button
Picture for Zhou Zhao

Zhou Zhao

Alert button

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models

Add code
Bookmark button
Alert button
May 23, 2023
Ziyue Jiang, Qian Yang, Jialong Zuo, Zhenhui Ye, Rongjie Huang, Yi Ren, Zhou Zhao

Figure 1 for FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Figure 2 for FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Figure 3 for FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Figure 4 for FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Viaarxiv icon

Connecting Multi-modal Contrastive Representations

Add code
Bookmark button
Alert button
May 22, 2023
Zehan Wang, Yang Zhao, Xize Cheng, Haifeng Huang, Jiageng Liu, Li Tang, Linjun Li, Yongqi Wang, Aoxiong Yin, Ziang Zhang, Zhou Zhao

Figure 1 for Connecting Multi-modal Contrastive Representations
Figure 2 for Connecting Multi-modal Contrastive Representations
Figure 3 for Connecting Multi-modal Contrastive Representations
Figure 4 for Connecting Multi-modal Contrastive Representations
Viaarxiv icon

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer

Add code
Bookmark button
Alert button
May 22, 2023
Huadai Liu, Rongjie Huang, Xuan Lin, Wenqiang Xu, Maozong Zheng, Hong Chen, Jinzheng He, Zhou Zhao

Figure 1 for ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
Figure 2 for ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
Figure 3 for ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
Figure 4 for ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
Viaarxiv icon

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing

Add code
Bookmark button
Alert button
May 21, 2023
Huadai Liu, Rongjie Huang, Jinzheng He, Gang Sun, Ran Shen, Xize Cheng, Zhou Zhao

Figure 1 for Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Figure 2 for Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Figure 3 for Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Figure 4 for Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Viaarxiv icon

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training

Add code
Bookmark button
Alert button
May 18, 2023
Zhenhui Ye, Rongjie Huang, Yi Ren, Ziyue Jiang, Jinglin Liu, Jinzheng He, Xiang Yin, Zhou Zhao

Figure 1 for CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training
Figure 2 for CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training
Figure 3 for CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training
Figure 4 for CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training
Viaarxiv icon

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis

Add code
Bookmark button
Alert button
May 18, 2023
Jinzheng He, Jinglin Liu, Zhenhui Ye, Rongjie Huang, Chenye Cui, Huadai Liu, Zhou Zhao

Figure 1 for RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
Figure 2 for RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
Figure 3 for RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
Figure 4 for RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
Viaarxiv icon

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

Add code
Bookmark button
Alert button
May 13, 2023
Ruiqi Li, Rongjie Huang, Lichao Zhang, Jinglin Liu, Zhou Zhao

Figure 1 for AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Figure 2 for AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Figure 3 for AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Figure 4 for AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Viaarxiv icon

ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos

Add code
Bookmark button
Alert button
May 04, 2023
Zhou Yu, Lixiang Zheng, Zhou Zhao, Fei Wu, Jianping Fan, Kui Ren, Jun Yu

Figure 1 for ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
Figure 2 for ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
Figure 3 for ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
Figure 4 for ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
Viaarxiv icon

Denoising Multi-modal Sequential Recommenders with Contrastive Learning

Add code
Bookmark button
Alert button
May 03, 2023
Dong Yao, Shengyu Zhang, Zhou Zhao, Jieming Zhu, Wenqiao Zhang, Rui Zhang, Xiaofei He, Fei Wu

Figure 1 for Denoising Multi-modal Sequential Recommenders with Contrastive Learning
Figure 2 for Denoising Multi-modal Sequential Recommenders with Contrastive Learning
Figure 3 for Denoising Multi-modal Sequential Recommenders with Contrastive Learning
Figure 4 for Denoising Multi-modal Sequential Recommenders with Contrastive Learning
Viaarxiv icon

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

Add code
Bookmark button
Alert button
May 01, 2023
Zhenhui Ye, Jinzheng He, Ziyue Jiang, Rongjie Huang, Jiawei Huang, Jinglin Liu, Yi Ren, Xiang Yin, Zejun Ma, Zhou Zhao

Figure 1 for GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Figure 2 for GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Figure 3 for GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Figure 4 for GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Viaarxiv icon