Picture for Yuexian Zou

Yuexian Zou

Video Referring Expression Comprehension via Transformer with Content-conditioned Query

Add code
Oct 25, 2023
Viaarxiv icon

NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement

Add code
Sep 03, 2023
Viaarxiv icon

MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning

Add code
Aug 25, 2023
Figure 1 for MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Figure 2 for MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Figure 3 for MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Figure 4 for MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Viaarxiv icon

G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory

Add code
Aug 18, 2023
Viaarxiv icon

Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions

Add code
Jul 28, 2023
Viaarxiv icon

Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels

Add code
Jul 05, 2023
Figure 1 for Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Figure 2 for Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Figure 3 for Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Figure 4 for Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Viaarxiv icon

Customizing General-Purpose Foundation Models for Medical Report Generation

Add code
Jun 09, 2023
Viaarxiv icon

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec

Add code
May 07, 2023
Viaarxiv icon

Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation

Add code
Apr 05, 2023
Viaarxiv icon

TLAG: An Informative Trigger and Label-Aware Knowledge Guided Model for Dialogue-based Relation Extraction

Add code
Mar 30, 2023
Viaarxiv icon