Picture for Xiaohan Wang

Xiaohan Wang

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Add code
Nov 11, 2025
Viaarxiv icon

From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory

Add code
Nov 11, 2025
Viaarxiv icon

AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond

Add code
Sep 30, 2025
Viaarxiv icon

Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation

Add code
Aug 28, 2025
Figure 1 for Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation
Figure 2 for Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation
Figure 3 for Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation
Figure 4 for Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation
Viaarxiv icon

Topology Guidance: Controlling the Outputs of Generative Models via Vector Field Topology

Add code
May 11, 2025
Figure 1 for Topology Guidance: Controlling the Outputs of Generative Models via Vector Field Topology
Figure 2 for Topology Guidance: Controlling the Outputs of Generative Models via Vector Field Topology
Figure 3 for Topology Guidance: Controlling the Outputs of Generative Models via Vector Field Topology
Figure 4 for Topology Guidance: Controlling the Outputs of Generative Models via Vector Field Topology
Viaarxiv icon

Video Action Differencing

Add code
Mar 10, 2025
Viaarxiv icon

SurgiSAM2: Fine-tuning a foundational model for surgical video anatomy segmentation and detection

Add code
Mar 05, 2025
Viaarxiv icon

Temporal Preference Optimization for Long-Form Video Understanding

Add code
Jan 23, 2025
Figure 1 for Temporal Preference Optimization for Long-Form Video Understanding
Figure 2 for Temporal Preference Optimization for Long-Form Video Understanding
Figure 3 for Temporal Preference Optimization for Long-Form Video Understanding
Figure 4 for Temporal Preference Optimization for Long-Form Video Understanding
Viaarxiv icon

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Add code
Jan 14, 2025
Viaarxiv icon

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Add code
Jan 06, 2025
Figure 1 for Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Figure 2 for Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Figure 3 for Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Figure 4 for Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Viaarxiv icon