Picture for Jingkuan Song

Jingkuan Song

SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism

Add code
Jul 02, 2025
Viaarxiv icon

OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction

Add code
May 26, 2025
Viaarxiv icon

Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution Approach

Add code
May 22, 2025
Viaarxiv icon

InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning

Add code
May 20, 2025
Viaarxiv icon

Policy Contrastive Decoding for Robotic Foundation Models

Add code
May 19, 2025
Viaarxiv icon

Towards Generalized and Training-Free Text-Guided Semantic Manipulation

Add code
Apr 24, 2025
Viaarxiv icon

Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation

Add code
Mar 11, 2025
Viaarxiv icon

Scale-Aware Pre-Training for Human-Centric Visual Perception: Enabling Lightweight and Generalizable Models

Add code
Mar 11, 2025
Viaarxiv icon

Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

Add code
Dec 16, 2024
Viaarxiv icon

GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark

Add code
Dec 13, 2024
Viaarxiv icon