Picture for Xiangyang Xue

Xiangyang Xue

Fudan University

VidSplice: Towards Coherent Video Inpainting via Explicit Spaced Frame Guidance

Add code
Oct 24, 2025
Viaarxiv icon

Learning Global Representation from Queries for Vectorized HD Map Construction

Add code
Oct 08, 2025
Viaarxiv icon

Training-Free Pyramid Token Pruning for Efficient Large Vision-Language Models via Region, Token, and Instruction-Guided Importance

Add code
Sep 19, 2025
Viaarxiv icon

HERO: Rethinking Visual Token Early Dropping in High-Resolution Large Vision-Language Models

Add code
Sep 16, 2025
Viaarxiv icon

From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation

Add code
Aug 13, 2025
Viaarxiv icon

Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Add code
Jul 15, 2025
Figure 1 for Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
Figure 2 for Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
Figure 3 for Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
Figure 4 for Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
Viaarxiv icon

Spatial-Temporal Aware Visuomotor Diffusion Policy Learning

Add code
Jul 09, 2025
Viaarxiv icon

A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding

Add code
Jul 09, 2025
Figure 1 for A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding
Figure 2 for A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding
Figure 3 for A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding
Figure 4 for A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding
Viaarxiv icon

TriVLA: A Triple-System-Based Unified Vision-Language-Action Model for General Robot Control

Add code
Jul 03, 2025
Viaarxiv icon

CrowdTrack: A Benchmark for Difficult Multiple Pedestrian Tracking in Real Scenarios

Add code
Jul 03, 2025
Viaarxiv icon