Picture for Minghui Liao

Minghui Liao

Visual Preference Optimization with Rubric Rewards

Add code
Apr 14, 2026
Viaarxiv icon

DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding

Add code
Apr 14, 2026
Viaarxiv icon

Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective

Add code
Dec 23, 2024
Figure 1 for Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective
Figure 2 for Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective
Figure 3 for Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective
Figure 4 for Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective
Viaarxiv icon

Partial Scene Text Retrieval

Add code
Nov 15, 2024
Figure 1 for Partial Scene Text Retrieval
Figure 2 for Partial Scene Text Retrieval
Figure 3 for Partial Scene Text Retrieval
Figure 4 for Partial Scene Text Retrieval
Viaarxiv icon

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

Add code
Oct 08, 2024
Viaarxiv icon

TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens

Add code
Oct 07, 2024
Viaarxiv icon

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

Add code
Apr 14, 2024
Figure 1 for TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Figure 2 for TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Figure 3 for TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Figure 4 for TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Viaarxiv icon

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

Add code
Mar 05, 2024
Figure 1 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Figure 2 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Figure 3 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Figure 4 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Viaarxiv icon

Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition

Add code
Feb 24, 2024
Viaarxiv icon

Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition

Add code
Feb 21, 2024
Viaarxiv icon