Picture for Junbo Niu

Junbo Niu

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Add code
Mar 23, 2026
Viaarxiv icon

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Add code
Feb 02, 2026
Viaarxiv icon

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Add code
Dec 18, 2025
Viaarxiv icon

DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM

Add code
Dec 11, 2025
Figure 1 for DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM
Figure 2 for DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM
Figure 3 for DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM
Figure 4 for DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM
Viaarxiv icon

VABench: A Comprehensive Benchmark for Audio-Video Generation

Add code
Dec 10, 2025
Viaarxiv icon

MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning

Add code
Oct 16, 2025
Figure 1 for MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
Figure 2 for MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
Figure 3 for MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
Figure 4 for MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
Viaarxiv icon

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Add code
Sep 26, 2025
Viaarxiv icon

Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models

Add code
Jun 15, 2025
Figure 1 for Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
Figure 2 for Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
Figure 3 for Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
Figure 4 for Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
Viaarxiv icon

Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions

Add code
Jun 09, 2025
Figure 1 for Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Figure 2 for Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Figure 3 for Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Figure 4 for Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Viaarxiv icon

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Add code
Jan 09, 2025
Figure 1 for OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Figure 2 for OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Figure 3 for OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Figure 4 for OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Viaarxiv icon