Picture for Shuhuai Ren

Shuhuai Ren

MiMo-V2-Flash Technical Report

Add code
Jan 08, 2026
Viaarxiv icon

MiMo-Audio: Audio Language Models are Few-Shot Learners

Add code
Dec 29, 2025
Viaarxiv icon

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

Add code
Dec 19, 2025
Viaarxiv icon

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

Add code
May 28, 2025
Viaarxiv icon

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Add code
May 12, 2025
Viaarxiv icon

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Add code
Apr 24, 2025
Viaarxiv icon

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Add code
Mar 20, 2025
Viaarxiv icon

UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?

Add code
Mar 13, 2025
Viaarxiv icon

Next Block Prediction: Video Generation via Semi-Autoregressive Modeling

Add code
Feb 12, 2025
Viaarxiv icon

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Add code
Dec 30, 2024
Figure 1 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 2 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 3 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 4 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Viaarxiv icon