Picture for Yuexian Zou

Yuexian Zou

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

Add code
Mar 30, 2023
Figure 1 for WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Figure 2 for WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Figure 3 for WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Figure 4 for WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Viaarxiv icon

Improving Text-Audio Retrieval by Text-aware Attention Pooling and Prior Matrix Revised Loss

Add code
Mar 19, 2023
Viaarxiv icon

PoseRAC: Pose Saliency Transformer for Repetitive Action Counting

Add code
Mar 16, 2023
Viaarxiv icon

FiTs: Fine-grained Two-stage Training for Knowledge-aware Question Answering

Add code
Mar 15, 2023
Figure 1 for FiTs: Fine-grained Two-stage Training for Knowledge-aware Question Answering
Figure 2 for FiTs: Fine-grained Two-stage Training for Knowledge-aware Question Answering
Figure 3 for FiTs: Fine-grained Two-stage Training for Knowledge-aware Question Answering
Figure 4 for FiTs: Fine-grained Two-stage Training for Knowledge-aware Question Answering
Viaarxiv icon

FTM: A Frame-level Timeline Modeling Method for Temporal Graph Representation Learning

Add code
Mar 15, 2023
Figure 1 for FTM: A Frame-level Timeline Modeling Method for Temporal Graph Representation Learning
Figure 2 for FTM: A Frame-level Timeline Modeling Method for Temporal Graph Representation Learning
Figure 3 for FTM: A Frame-level Timeline Modeling Method for Temporal Graph Representation Learning
Figure 4 for FTM: A Frame-level Timeline Modeling Method for Temporal Graph Representation Learning
Viaarxiv icon

Improve Retrieval-based Dialogue System via Syntax-Informed Attention

Add code
Mar 12, 2023
Viaarxiv icon

ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation

Add code
Mar 11, 2023
Viaarxiv icon

Improving Weakly Supervised Sound Event Detection with Causal Intervention

Add code
Mar 10, 2023
Viaarxiv icon

SSVMR: Saliency-based Self-training for Video-Music Retrieval

Add code
Feb 18, 2023
Viaarxiv icon

Generating Templated Caption for Video Grounding

Add code
Jan 15, 2023
Viaarxiv icon