Picture for Xinxiao Wu

Xinxiao Wu

LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization

Add code
May 30, 2025
Viaarxiv icon

VUDG: A Dataset for Video Understanding Domain Generalization

Add code
May 30, 2025
Viaarxiv icon

METOR: A Unified Framework for Mutual Enhancement of Objects and Relationships in Open-vocabulary Video Visual Relationship Detection

Add code
May 10, 2025
Viaarxiv icon

TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials

Add code
Apr 17, 2025
Viaarxiv icon

Video Summarization using Denoising Diffusion Probabilistic Model

Add code
Dec 12, 2024
Figure 1 for Video Summarization using Denoising Diffusion Probabilistic Model
Figure 2 for Video Summarization using Denoising Diffusion Probabilistic Model
Figure 3 for Video Summarization using Denoising Diffusion Probabilistic Model
Figure 4 for Video Summarization using Denoising Diffusion Probabilistic Model
Viaarxiv icon

How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey

Add code
Dec 11, 2024
Viaarxiv icon

Storyboard guided Alignment for Fine-grained Video Action Recognition

Add code
Oct 18, 2024
Viaarxiv icon

Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning

Add code
Mar 02, 2024
Viaarxiv icon

DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification

Add code
May 25, 2023
Figure 1 for DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification
Figure 2 for DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification
Figure 3 for DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification
Figure 4 for DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification
Viaarxiv icon

Meta-causal Learning for Single Domain Generalization

Add code
Apr 07, 2023
Viaarxiv icon