Picture for Le Wang

Le Wang

Xi'an Jiaotong University

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Add code
Jun 24, 2025
Viaarxiv icon

AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions

Add code
Jun 17, 2025
Viaarxiv icon

Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation

Add code
Jun 11, 2025
Viaarxiv icon

FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation

Add code
Jun 10, 2025
Viaarxiv icon

From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval

Add code
Apr 25, 2025
Viaarxiv icon

RSRNav: Reasoning Spatial Relationship for Image-Goal Navigation

Add code
Apr 25, 2025
Viaarxiv icon

Manipulating Multimodal Agents via Cross-Modal Prompt Injection

Add code
Apr 22, 2025
Viaarxiv icon

Moment Quantization for Video Temporal Grounding

Add code
Apr 03, 2025
Viaarxiv icon

CogMorph: Cognitive Morphing Attacks for Text-to-Image Models

Add code
Jan 21, 2025
Viaarxiv icon

Referencing Where to Focus: Improving VisualGrounding with Referential Query

Add code
Dec 26, 2024
Figure 1 for Referencing Where to Focus: Improving VisualGrounding with Referential Query
Figure 2 for Referencing Where to Focus: Improving VisualGrounding with Referential Query
Figure 3 for Referencing Where to Focus: Improving VisualGrounding with Referential Query
Figure 4 for Referencing Where to Focus: Improving VisualGrounding with Referential Query
Viaarxiv icon