Picture for Lianli Gao

Lianli Gao

OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction

Add code
May 26, 2025
Viaarxiv icon

Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution Approach

Add code
May 22, 2025
Viaarxiv icon

InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning

Add code
May 20, 2025
Viaarxiv icon

Policy Contrastive Decoding for Robotic Foundation Models

Add code
May 19, 2025
Viaarxiv icon

Towards Generalized and Training-Free Text-Guided Semantic Manipulation

Add code
Apr 24, 2025
Viaarxiv icon

Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation

Add code
Mar 11, 2025
Viaarxiv icon

Scale-Aware Pre-Training for Human-Centric Visual Perception: Enabling Lightweight and Generalizable Models

Add code
Mar 11, 2025
Viaarxiv icon

Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

Add code
Dec 16, 2024
Viaarxiv icon

GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark

Add code
Dec 13, 2024
Viaarxiv icon

Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models

Add code
Nov 19, 2024
Figure 1 for Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
Figure 2 for Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
Figure 3 for Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
Figure 4 for Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
Viaarxiv icon