Picture for Xuchen Li

Xuchen Li

Look Less, Reason More: Rollout-Guided Adaptive Pixel-Space Reasoning

Add code
Oct 02, 2025
Viaarxiv icon

VS-LLM: Visual-Semantic Depression Assessment based on LLM for Drawing Projection Test

Add code
Aug 07, 2025
Viaarxiv icon

CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos

Add code
Jul 22, 2025
Viaarxiv icon

DARTer: Dynamic Adaptive Representation Tracker for Nighttime UAV Tracking

Add code
May 01, 2025
Viaarxiv icon

How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking

Add code
Nov 23, 2024
Figure 1 for How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking
Figure 2 for How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking
Figure 3 for How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking
Figure 4 for How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking
Viaarxiv icon

Students Rather Than Experts: A New AI For Education Pipeline To Model More Human-Like And Personalised Early Adolescences

Add code
Oct 21, 2024
Figure 1 for Students Rather Than Experts: A New AI For Education Pipeline To Model More Human-Like And Personalised Early Adolescences
Figure 2 for Students Rather Than Experts: A New AI For Education Pipeline To Model More Human-Like And Personalised Early Adolescences
Figure 3 for Students Rather Than Experts: A New AI For Education Pipeline To Model More Human-Like And Personalised Early Adolescences
Figure 4 for Students Rather Than Experts: A New AI For Education Pipeline To Model More Human-Like And Personalised Early Adolescences
Viaarxiv icon

Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison

Add code
Oct 20, 2024
Figure 1 for Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison
Figure 2 for Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison
Figure 3 for Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison
Figure 4 for Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison
Viaarxiv icon

DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM

Add code
Oct 03, 2024
Figure 1 for DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM
Figure 2 for DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM
Figure 3 for DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM
Figure 4 for DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM
Viaarxiv icon

Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark

Add code
Sep 13, 2024
Figure 1 for Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark
Figure 2 for Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark
Figure 3 for Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark
Viaarxiv icon

DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM

Add code
May 20, 2024
Viaarxiv icon