Picture for Yuexian Zou

Yuexian Zou

ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation

Add code
Mar 11, 2023
Viaarxiv icon

Improving Weakly Supervised Sound Event Detection with Causal Intervention

Add code
Mar 10, 2023
Viaarxiv icon

SSVMR: Saliency-based Self-training for Video-Music Retrieval

Add code
Feb 18, 2023
Viaarxiv icon

Generating Templated Caption for Video Grounding

Add code
Jan 15, 2023
Viaarxiv icon

Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation

Add code
Dec 24, 2022
Viaarxiv icon

M3ST: Mix at Three Levels for Speech Translation

Add code
Dec 07, 2022
Viaarxiv icon

Aligning Source Visual and Target Language Domains for Unpaired Video Captioning

Add code
Nov 22, 2022
Viaarxiv icon

A Dynamic Graph Interactive Framework with Label-Semantic Injection for Spoken Language Understanding

Add code
Nov 08, 2022
Viaarxiv icon

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS

Add code
Nov 04, 2022
Figure 1 for NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Figure 2 for NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Figure 3 for NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Viaarxiv icon

DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention

Add code
Oct 28, 2022
Figure 1 for DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention
Figure 2 for DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention
Figure 3 for DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention
Figure 4 for DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention
Viaarxiv icon