Picture for Yuxuan Cai

Yuxuan Cai

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

Add code
Jun 16, 2025
Viaarxiv icon

Omni-AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented for Efficient Long Video Understanding

Add code
Jun 16, 2025
Viaarxiv icon

Task-Core Memory Management and Consolidation for Long-term Continual Learning

Add code
May 15, 2025
Viaarxiv icon

Omni-AD: Learning to Reconstruct Global and Local Features for Multi-class Anomaly Detection

Add code
Mar 27, 2025
Viaarxiv icon

Towards Robust and Reliable Concept Representations: Reliability-Enhanced Concept Embedding Model

Add code
Feb 03, 2025
Figure 1 for Towards Robust and Reliable Concept Representations: Reliability-Enhanced Concept Embedding Model
Figure 2 for Towards Robust and Reliable Concept Representations: Reliability-Enhanced Concept Embedding Model
Figure 3 for Towards Robust and Reliable Concept Representations: Reliability-Enhanced Concept Embedding Model
Figure 4 for Towards Robust and Reliable Concept Representations: Reliability-Enhanced Concept Embedding Model
Viaarxiv icon

Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation

Add code
Dec 02, 2024
Figure 1 for Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Figure 2 for Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Figure 3 for Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Figure 4 for Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Viaarxiv icon

Fleximo: Towards Flexible Text-to-Human Motion Video Generation

Add code
Nov 29, 2024
Viaarxiv icon

Improving Multi-Subject Consistency in Open-Domain Image Generation with Isolation and Reposition Attention

Add code
Nov 28, 2024
Viaarxiv icon

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

Add code
Nov 24, 2024
Figure 1 for MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Figure 2 for MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Figure 3 for MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Figure 4 for MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Viaarxiv icon

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

Add code
Oct 21, 2024
Viaarxiv icon