Picture for Yifei Xin

Yifei Xin

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Add code
Jun 11, 2024
Viaarxiv icon

SLIT: Boosting Audio-Text Pre-Training via Multi-Stage Learning and Instruction Tuning

Add code
Feb 20, 2024
Viaarxiv icon

Masked Audio Modeling with CLAP and Multi-Objective Learning

Add code
Jan 29, 2024
Viaarxiv icon

Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions

Add code
Jul 28, 2023
Figure 1 for Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Figure 2 for Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Figure 3 for Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Figure 4 for Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Viaarxiv icon

Improving Text-Audio Retrieval by Text-aware Attention Pooling and Prior Matrix Revised Loss

Add code
Mar 19, 2023
Figure 1 for Improving Text-Audio Retrieval by Text-aware Attention Pooling and Prior Matrix Revised Loss
Figure 2 for Improving Text-Audio Retrieval by Text-aware Attention Pooling and Prior Matrix Revised Loss
Figure 3 for Improving Text-Audio Retrieval by Text-aware Attention Pooling and Prior Matrix Revised Loss
Viaarxiv icon

Improving Weakly Supervised Sound Event Detection with Causal Intervention

Add code
Mar 10, 2023
Figure 1 for Improving Weakly Supervised Sound Event Detection with Causal Intervention
Figure 2 for Improving Weakly Supervised Sound Event Detection with Causal Intervention
Figure 3 for Improving Weakly Supervised Sound Event Detection with Causal Intervention
Viaarxiv icon

Improving Speech Enhancement via Event-based Query

Add code
Feb 24, 2023
Figure 1 for Improving Speech Enhancement via Event-based Query
Figure 2 for Improving Speech Enhancement via Event-based Query
Figure 3 for Improving Speech Enhancement via Event-based Query
Figure 4 for Improving Speech Enhancement via Event-based Query
Viaarxiv icon