Picture for Longteng Guo

Longteng Guo

Institute of Automation, Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences

SciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluation

Add code
May 11, 2026
Viaarxiv icon

M$^3$-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering

Add code
Apr 28, 2026
Viaarxiv icon

AdaSpark: Adaptive Sparsity for Efficient Long-Video Understanding

Add code
Apr 09, 2026
Viaarxiv icon

Thinking in Streaming Video

Add code
Mar 13, 2026
Viaarxiv icon

S1-MMAlign: A Large-Scale, Multi-Disciplinary Dataset for Scientific Figure-Text Understanding

Add code
Jan 01, 2026
Viaarxiv icon

UrbanNav: Learning Language-Guided Urban Navigation from Web-Scale Human Trajectories

Add code
Dec 10, 2025
Figure 1 for UrbanNav: Learning Language-Guided Urban Navigation from Web-Scale Human Trajectories
Figure 2 for UrbanNav: Learning Language-Guided Urban Navigation from Web-Scale Human Trajectories
Figure 3 for UrbanNav: Learning Language-Guided Urban Navigation from Web-Scale Human Trajectories
Figure 4 for UrbanNav: Learning Language-Guided Urban Navigation from Web-Scale Human Trajectories
Viaarxiv icon

Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward

Add code
Jun 05, 2025
Figure 1 for Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward
Figure 2 for Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward
Figure 3 for Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward
Viaarxiv icon

Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities

Add code
Apr 02, 2025
Viaarxiv icon

FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks

Add code
Mar 18, 2025
Figure 1 for FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks
Figure 2 for FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks
Figure 3 for FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks
Figure 4 for FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks
Viaarxiv icon

Efficient Motion-Aware Video MLLM

Add code
Mar 17, 2025
Figure 1 for Efficient Motion-Aware Video MLLM
Figure 2 for Efficient Motion-Aware Video MLLM
Figure 3 for Efficient Motion-Aware Video MLLM
Figure 4 for Efficient Motion-Aware Video MLLM
Viaarxiv icon