Picture for Mingxin Huang

Mingxin Huang

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Add code
Dec 31, 2024
Figure 1 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 2 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 3 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 4 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Viaarxiv icon

Creating a Microstructure Latent Space with Rich Material Information for Multiphase Alloy Design

Add code
Sep 04, 2024
Viaarxiv icon

Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models

Add code
Aug 09, 2024
Viaarxiv icon

Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping

Add code
Aug 04, 2024
Viaarxiv icon

VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization

Add code
Apr 30, 2024
Viaarxiv icon

Bridging the Gap Between End-to-End and Two-Step Text Spotting

Add code
Apr 06, 2024
Viaarxiv icon

SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting

Add code
Jan 15, 2024
Viaarxiv icon

Progressive Evolution from Single-Point to Polygon for Scene Text

Add code
Dec 21, 2023
Viaarxiv icon

Hierarchical Side-Tuning for Vision Transformers

Add code
Oct 10, 2023
Figure 1 for Hierarchical Side-Tuning for Vision Transformers
Figure 2 for Hierarchical Side-Tuning for Vision Transformers
Figure 3 for Hierarchical Side-Tuning for Vision Transformers
Figure 4 for Hierarchical Side-Tuning for Vision Transformers
Viaarxiv icon

ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

Add code
Aug 20, 2023
Viaarxiv icon