Picture for Minghui Liao

Minghui Liao

DiffMath: Symbol- and Graph-Aware Latent Diffusion Transformer for Handwritten Mathematical Expression Generation

Add code
Jun 18, 2026
Viaarxiv icon

Reinforcement Learning with Robust Rubric Rewards

Add code
May 28, 2026
Viaarxiv icon

DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding

Add code
Apr 14, 2026
Viaarxiv icon

Visual Preference Optimization with Rubric Rewards

Add code
Apr 14, 2026
Viaarxiv icon

Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective

Add code
Dec 23, 2024
Figure 1 for Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective
Figure 2 for Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective
Figure 3 for Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective
Figure 4 for Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective
Viaarxiv icon

Partial Scene Text Retrieval

Add code
Nov 15, 2024
Figure 1 for Partial Scene Text Retrieval
Figure 2 for Partial Scene Text Retrieval
Figure 3 for Partial Scene Text Retrieval
Figure 4 for Partial Scene Text Retrieval
Viaarxiv icon

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

Add code
Oct 08, 2024
Viaarxiv icon

TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens

Add code
Oct 07, 2024
Viaarxiv icon

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

Add code
Apr 14, 2024
Figure 1 for TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Figure 2 for TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Figure 3 for TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Figure 4 for TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Viaarxiv icon

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

Add code
Mar 05, 2024
Figure 1 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Figure 2 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Figure 3 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Figure 4 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Viaarxiv icon