Picture for Binghong Wu

Binghong Wu

ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and Formats

Add code
May 31, 2026
Viaarxiv icon

Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild

Add code
May 21, 2026
Viaarxiv icon

Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

Add code
May 21, 2025
Viaarxiv icon

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting

Add code
May 20, 2025
Viaarxiv icon

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Add code
Dec 31, 2024
Figure 1 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 2 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 3 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 4 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Viaarxiv icon

Harmonizing Visual Text Comprehension and Generation

Add code
Jul 23, 2024
Figure 1 for Harmonizing Visual Text Comprehension and Generation
Figure 2 for Harmonizing Visual Text Comprehension and Generation
Figure 3 for Harmonizing Visual Text Comprehension and Generation
Figure 4 for Harmonizing Visual Text Comprehension and Generation
Viaarxiv icon

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

Add code
Jul 02, 2024
Figure 1 for A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Figure 2 for A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Figure 3 for A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Figure 4 for A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Viaarxiv icon

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

Add code
Jun 03, 2024
Figure 1 for TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Figure 2 for TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Figure 3 for TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Figure 4 for TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Viaarxiv icon

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Add code
Apr 19, 2024
Figure 1 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 2 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 3 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 4 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Viaarxiv icon

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Add code
Nov 23, 2023
Figure 1 for Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Figure 2 for Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Figure 3 for Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Figure 4 for Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Viaarxiv icon