Picture for Kun Yuan

Kun Yuan

EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy

Add code
May 21, 2025
Viaarxiv icon

ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling

Add code
May 19, 2025
Viaarxiv icon

Diffusion Learning with Partial Agent Participation and Local Updates

Add code
May 16, 2025
Viaarxiv icon

NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: KwaiSR Dataset and Study

Add code
Apr 21, 2025
Viaarxiv icon

Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery

Add code
Apr 02, 2025
Viaarxiv icon

Optimal Complexity in Byzantine-Robust Distributed Stochastic Optimization with Data Heterogeneity

Add code
Mar 20, 2025
Viaarxiv icon

KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception

Add code
Mar 13, 2025
Viaarxiv icon

MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments

Add code
Mar 04, 2025
Viaarxiv icon

A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models

Add code
Feb 11, 2025
Viaarxiv icon

Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment

Add code
Feb 04, 2025
Viaarxiv icon