Picture for Xiaojuan Qi

Xiaojuan Qi

Self-Evaluation Unlocks Any-Step Text-to-Image Generation

Add code
Dec 26, 2025
Viaarxiv icon

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Add code
Dec 23, 2025
Viaarxiv icon

ASSIST-3D: Adapted Scene Synthesis for Class-Agnostic 3D Instance Segmentation

Add code
Dec 10, 2025
Viaarxiv icon

Efficient lattice field theory simulation using adaptive normalizing flow on a resistive memory-based neural differential equation solver

Add code
Sep 16, 2025
Viaarxiv icon

NoteIt: A System Converting Instructional Videos to Interactable Notes Through Multimodal Video Understanding

Add code
Aug 20, 2025
Figure 1 for NoteIt: A System Converting Instructional Videos to Interactable Notes Through Multimodal Video Understanding
Figure 2 for NoteIt: A System Converting Instructional Videos to Interactable Notes Through Multimodal Video Understanding
Figure 3 for NoteIt: A System Converting Instructional Videos to Interactable Notes Through Multimodal Video Understanding
Figure 4 for NoteIt: A System Converting Instructional Videos to Interactable Notes Through Multimodal Video Understanding
Viaarxiv icon

Understanding Data Influence with Differential Approximation

Add code
Aug 20, 2025
Viaarxiv icon

S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix

Add code
Aug 11, 2025
Figure 1 for S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix
Figure 2 for S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix
Figure 3 for S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix
Figure 4 for S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix
Viaarxiv icon

Aligning Effective Tokens with Video Anomaly in Large Language Models

Add code
Aug 08, 2025
Viaarxiv icon

Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries

Add code
Jul 16, 2025
Figure 1 for Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries
Figure 2 for Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries
Figure 3 for Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries
Figure 4 for Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries
Viaarxiv icon

Scaling RL to Long Videos

Add code
Jul 10, 2025
Viaarxiv icon