Picture for Xiangyang Xue

Xiangyang Xue

Fudan University

Learning Global Representation from Queries for Vectorized HD Map Construction

Add code
Oct 08, 2025
Viaarxiv icon

Training-Free Pyramid Token Pruning for Efficient Large Vision-Language Models via Region, Token, and Instruction-Guided Importance

Add code
Sep 19, 2025
Viaarxiv icon

HERO: Rethinking Visual Token Early Dropping in High-Resolution Large Vision-Language Models

Add code
Sep 16, 2025
Viaarxiv icon

From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation

Add code
Aug 13, 2025
Viaarxiv icon

Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Add code
Jul 15, 2025
Figure 1 for Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
Figure 2 for Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
Figure 3 for Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
Figure 4 for Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
Viaarxiv icon

A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding

Add code
Jul 09, 2025
Figure 1 for A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding
Figure 2 for A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding
Figure 3 for A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding
Figure 4 for A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding
Viaarxiv icon

Spatial-Temporal Aware Visuomotor Diffusion Policy Learning

Add code
Jul 09, 2025
Viaarxiv icon

TriVLA: A Triple-System-Based Unified Vision-Language-Action Model for General Robot Control

Add code
Jul 03, 2025
Viaarxiv icon

CrowdTrack: A Benchmark for Difficult Multiple Pedestrian Tracking in Real Scenarios

Add code
Jul 03, 2025
Viaarxiv icon

TriVLA: A Unified Triple-System-Based Unified Vision-Language-Action Model for General Robot Control

Add code
Jul 02, 2025
Viaarxiv icon