Picture for Wenqiang Zhang

Wenqiang Zhang

Fudan university

VLA-Hijack: A Transferable Patch Attack against Vision-Language-Action Models via Visual Proprioception Hijacking

Add code
May 27, 2026
Viaarxiv icon

Unified Multimodal Visual Tracking with Dual Mixture-of-Experts

Add code
May 05, 2026
Viaarxiv icon

Cognition-Inspired Dual-Stream Semantic Enhancement for Vision-Based Dynamic Emotion Modeling

Add code
Apr 14, 2026
Viaarxiv icon

ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception

Add code
Apr 14, 2026
Viaarxiv icon

The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results

Add code
Apr 13, 2026
Viaarxiv icon

EmoScene: A Dual-space Dataset for Controllable Affective Image Generation

Add code
Apr 01, 2026
Viaarxiv icon

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Add code
Mar 23, 2026
Viaarxiv icon

GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning

Add code
Jan 26, 2026
Viaarxiv icon

Adaptive Attention Distillation for Robust Few-Shot Segmentation under Environmental Perturbations

Add code
Jan 07, 2026
Viaarxiv icon

RSAgent: Learning to Reason and Act for Text-Guided Segmentation via Multi-Turn Tool Invocations

Add code
Dec 30, 2025
Viaarxiv icon