Picture for Xiaojuan Qi

Xiaojuan Qi

NoteIt: A System Converting Instructional Videos to Interactable Notes Through Multimodal Video Understanding

Add code
Aug 20, 2025
Viaarxiv icon

Understanding Data Influence with Differential Approximation

Add code
Aug 20, 2025
Viaarxiv icon

S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix

Add code
Aug 11, 2025
Viaarxiv icon

Aligning Effective Tokens with Video Anomaly in Large Language Models

Add code
Aug 08, 2025
Viaarxiv icon

Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries

Add code
Jul 16, 2025
Viaarxiv icon

Scaling RL to Long Videos

Add code
Jul 10, 2025
Viaarxiv icon

Holistic Tokenizer for Autoregressive Image Generation

Add code
Jul 03, 2025
Viaarxiv icon

UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation

Add code
May 30, 2025
Viaarxiv icon

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Add code
May 19, 2025
Viaarxiv icon

DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation

Add code
Mar 19, 2025
Viaarxiv icon