Picture for Yuexian Zou

Yuexian Zou

SSVMR: Saliency-based Self-training for Video-Music Retrieval

Add code
Feb 18, 2023
Viaarxiv icon

Generating Templated Caption for Video Grounding

Add code
Jan 15, 2023
Viaarxiv icon

Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation

Add code
Dec 24, 2022
Viaarxiv icon

M3ST: Mix at Three Levels for Speech Translation

Add code
Dec 07, 2022
Viaarxiv icon

Aligning Source Visual and Target Language Domains for Unpaired Video Captioning

Add code
Nov 22, 2022
Viaarxiv icon

A Dynamic Graph Interactive Framework with Label-Semantic Injection for Spoken Language Understanding

Add code
Nov 08, 2022
Viaarxiv icon

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS

Add code
Nov 04, 2022
Viaarxiv icon

DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention

Add code
Oct 28, 2022
Viaarxiv icon

Video Referring Expression Comprehension via Transformer with Content-aware Query

Add code
Oct 06, 2022
Figure 1 for Video Referring Expression Comprehension via Transformer with Content-aware Query
Figure 2 for Video Referring Expression Comprehension via Transformer with Content-aware Query
Figure 3 for Video Referring Expression Comprehension via Transformer with Content-aware Query
Figure 4 for Video Referring Expression Comprehension via Transformer with Content-aware Query
Viaarxiv icon

Correspondence Matters for Video Referring Expression Comprehension

Add code
Jul 21, 2022
Figure 1 for Correspondence Matters for Video Referring Expression Comprehension
Figure 2 for Correspondence Matters for Video Referring Expression Comprehension
Figure 3 for Correspondence Matters for Video Referring Expression Comprehension
Figure 4 for Correspondence Matters for Video Referring Expression Comprehension
Viaarxiv icon