Picture for Xiaoyu Shen

Xiaoyu Shen

UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents

Add code
Apr 13, 2026
Viaarxiv icon

Evaluation Before Generation: A Paradigm for Robust Multimodal Sentiment Analysis with Missing Modalities

Add code
Apr 07, 2026
Viaarxiv icon

CeRLP: A Cross-embodiment Robot Local Planning Framework for Visual Navigation

Add code
Mar 20, 2026
Viaarxiv icon

From Static Inference to Dynamic Interaction: Navigating the Landscape of Streaming Large Language Models

Add code
Mar 04, 2026
Viaarxiv icon

Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models

Add code
Mar 03, 2026
Viaarxiv icon

Beyond Global Similarity: Towards Fine-Grained, Multi-Condition Multimodal Retrieval

Add code
Mar 01, 2026
Viaarxiv icon

What Do Visual Tokens Really Encode? Uncovering Sparsity and Redundancy in Multimodal Large Language Models

Add code
Feb 28, 2026
Viaarxiv icon

HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit

Add code
Feb 27, 2026
Viaarxiv icon

UTPTrack: Towards Simple and Unified Token Pruning for Visual Tracking

Add code
Feb 27, 2026
Viaarxiv icon

Rethinking the Role of LLMs in Time Series Forecasting

Add code
Feb 16, 2026
Viaarxiv icon