Picture for Chengquan Zhang

Chengquan Zhang

Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe

Add code
May 05, 2026
Viaarxiv icon

Do Phone-Use Agents Respect Your Privacy?

Add code
Apr 02, 2026
Viaarxiv icon

Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training

Add code
Mar 25, 2026
Viaarxiv icon

MMTIT-Bench: A Multilingual and Multi-Scenario Benchmark with Cognition-Perception-Reasoning Guided Text-Image Machine Translation

Add code
Mar 25, 2026
Viaarxiv icon

Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity

Add code
Mar 11, 2026
Viaarxiv icon

Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs

Add code
Jun 11, 2025
Viaarxiv icon

Recognition-Synergistic Scene Text Editing

Add code
Mar 11, 2025
Figure 1 for Recognition-Synergistic Scene Text Editing
Figure 2 for Recognition-Synergistic Scene Text Editing
Figure 3 for Recognition-Synergistic Scene Text Editing
Figure 4 for Recognition-Synergistic Scene Text Editing
Viaarxiv icon

R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

Add code
Oct 23, 2024
Figure 1 for R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Figure 2 for R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Figure 3 for R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Figure 4 for R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Viaarxiv icon

DiffCSG: Differentiable CSG via Rasterization

Add code
Sep 02, 2024
Viaarxiv icon

WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting

Add code
Jul 28, 2024
Viaarxiv icon