Picture for Can Huang

Can Huang

Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning

Add code
May 21, 2025
Viaarxiv icon

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting

Add code
May 20, 2025
Viaarxiv icon

Advancing Sequential Numerical Prediction in Autoregressive Models

Add code
May 19, 2025
Viaarxiv icon

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?

Add code
May 16, 2025
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Viaarxiv icon

Vision as LoRA

Add code
Mar 26, 2025
Viaarxiv icon

EVE: Towards End-to-End Video Subtitle Extraction with Vision-Language Models

Add code
Mar 06, 2025
Viaarxiv icon

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Add code
Dec 31, 2024
Figure 1 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 2 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 3 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 4 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Viaarxiv icon

Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM

Add code
Dec 12, 2024
Viaarxiv icon

Grounding Natural Language to SQL Translation with Data-Based Self-Explanations

Add code
Nov 05, 2024
Viaarxiv icon