Picture for Zhaoyu Chen

Zhaoyu Chen

VLA-Hijack: A Transferable Patch Attack against Vision-Language-Action Models via Visual Proprioception Hijacking

Add code
May 27, 2026
Viaarxiv icon

Unified Multimodal Visual Tracking with Dual Mixture-of-Experts

Add code
May 05, 2026
Viaarxiv icon

GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning

Add code
Jan 26, 2026
Viaarxiv icon

Forging a Dynamic Memory: Retrieval-Guided Continual Learning for Generalist Medical Foundation Models

Add code
Dec 15, 2025
Viaarxiv icon

Seeing is Believing: Rich-Context Hallucination Detection for MLLMs via Backward Visual Grounding

Add code
Nov 15, 2025
Viaarxiv icon

Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection

Add code
Nov 14, 2025
Viaarxiv icon

Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks for Enhanced Action Understanding

Add code
Aug 10, 2025
Viaarxiv icon

LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops

Add code
Jun 17, 2025
Viaarxiv icon

dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis

Add code
Mar 13, 2025
Viaarxiv icon

MMARD: Improving the Min-Max Optimization Process in Adversarial Robustness Distillation

Add code
Mar 09, 2025
Viaarxiv icon