Picture for Qunyi Xie

Qunyi Xie

P-MTP: Efficient Document Parsing via Multi-Token Prediction with Progressive Depth Scaling

Add code
Jun 23, 2026
Viaarxiv icon

Unlimited OCR Works

Add code
Jun 22, 2026
Viaarxiv icon

ERNIE 5.0 Technical Report

Add code
Feb 04, 2026
Viaarxiv icon

On Data Synthesis and Post-training for Visual Abstract Reasoning

Add code
Apr 02, 2025
Figure 1 for On Data Synthesis and Post-training for Visual Abstract Reasoning
Figure 2 for On Data Synthesis and Post-training for Visual Abstract Reasoning
Figure 3 for On Data Synthesis and Post-training for Visual Abstract Reasoning
Figure 4 for On Data Synthesis and Post-training for Visual Abstract Reasoning
Viaarxiv icon

StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

Add code
Jun 04, 2024
Figure 1 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Figure 2 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Figure 3 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Figure 4 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Viaarxiv icon

MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary

Add code
Jul 24, 2023
Figure 1 for MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary
Figure 2 for MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary
Figure 3 for MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary
Figure 4 for MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary
Viaarxiv icon

Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding

Add code
May 19, 2023
Figure 1 for Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
Figure 2 for Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
Figure 3 for Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
Figure 4 for Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
Viaarxiv icon