Picture for Chengquan Zhang

Chengquan Zhang

Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training

Add code
Mar 25, 2026
Viaarxiv icon

MMTIT-Bench: A Multilingual and Multi-Scenario Benchmark with Cognition-Perception-Reasoning Guided Text-Image Machine Translation

Add code
Mar 25, 2026
Viaarxiv icon

Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity

Add code
Mar 11, 2026
Viaarxiv icon

Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs

Add code
Jun 11, 2025
Viaarxiv icon

Recognition-Synergistic Scene Text Editing

Add code
Mar 11, 2025
Figure 1 for Recognition-Synergistic Scene Text Editing
Figure 2 for Recognition-Synergistic Scene Text Editing
Figure 3 for Recognition-Synergistic Scene Text Editing
Figure 4 for Recognition-Synergistic Scene Text Editing
Viaarxiv icon

R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

Add code
Oct 23, 2024
Figure 1 for R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Figure 2 for R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Figure 3 for R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Figure 4 for R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Viaarxiv icon

DiffCSG: Differentiable CSG via Rasterization

Add code
Sep 02, 2024
Viaarxiv icon

WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting

Add code
Jul 28, 2024
Viaarxiv icon

StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

Add code
Jun 04, 2024
Figure 1 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Figure 2 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Figure 3 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Figure 4 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Viaarxiv icon

Towards Unified Multi-granularity Text Detection with Interactive Attention

Add code
May 30, 2024
Figure 1 for Towards Unified Multi-granularity Text Detection with Interactive Attention
Figure 2 for Towards Unified Multi-granularity Text Detection with Interactive Attention
Figure 3 for Towards Unified Multi-granularity Text Detection with Interactive Attention
Figure 4 for Towards Unified Multi-granularity Text Detection with Interactive Attention
Viaarxiv icon