Picture for Yulong Ao

Yulong Ao

Beijing Academy of Artificial Intelligence

Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models

Add code
Mar 12, 2026
Viaarxiv icon

RoboBrain 2.5: Depth in Sight, Time in Mind

Add code
Jan 20, 2026
Viaarxiv icon

TrimTokenator-LC: Towards Adaptive Visual Token Pruning for Large Multimodal Models with Long Contexts

Add code
Dec 31, 2025
Viaarxiv icon

Emu3.5: Native Multimodal Models are World Learners

Add code
Oct 30, 2025
Viaarxiv icon

RoboOS-NeXT: A Unified Memory-based Framework for Lifelong, Scalable, and Robust Multi-Robot Collaboration

Add code
Oct 30, 2025
Viaarxiv icon

RoboBrain 2.0 Technical Report

Add code
Jul 02, 2025
Viaarxiv icon

Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

Add code
Oct 24, 2024
Figure 1 for Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Figure 2 for Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Figure 3 for Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Figure 4 for Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Viaarxiv icon

Emu3: Next-Token Prediction is All You Need

Add code
Sep 27, 2024
Figure 1 for Emu3: Next-Token Prediction is All You Need
Figure 2 for Emu3: Next-Token Prediction is All You Need
Figure 3 for Emu3: Next-Token Prediction is All You Need
Figure 4 for Emu3: Next-Token Prediction is All You Need
Viaarxiv icon

Aquila2 Technical Report

Add code
Aug 14, 2024
Viaarxiv icon

AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

Add code
Aug 13, 2024
Viaarxiv icon