Picture for Junbo Niu

Junbo Niu

MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning

Add code
Oct 16, 2025
Viaarxiv icon

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Add code
Sep 26, 2025
Viaarxiv icon

Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models

Add code
Jun 15, 2025
Figure 1 for Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
Figure 2 for Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
Figure 3 for Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
Figure 4 for Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
Viaarxiv icon

Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions

Add code
Jun 09, 2025
Figure 1 for Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Figure 2 for Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Figure 3 for Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Figure 4 for Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Viaarxiv icon

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Add code
Jan 09, 2025
Figure 1 for OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Figure 2 for OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Figure 3 for OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Figure 4 for OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Viaarxiv icon

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Add code
Dec 12, 2024
Figure 1 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 2 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 3 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 4 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Viaarxiv icon