Picture for Xingyu Wan

Xingyu Wan

Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training

Add code
Mar 25, 2026
Viaarxiv icon

MMTIT-Bench: A Multilingual and Multi-Scenario Benchmark with Cognition-Perception-Reasoning Guided Text-Image Machine Translation

Add code
Mar 25, 2026
Viaarxiv icon

StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

Add code
Jun 04, 2024
Figure 1 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Figure 2 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Figure 3 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Figure 4 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Viaarxiv icon

Towards Unified Multi-granularity Text Detection with Interactive Attention

Add code
May 30, 2024
Figure 1 for Towards Unified Multi-granularity Text Detection with Interactive Attention
Figure 2 for Towards Unified Multi-granularity Text Detection with Interactive Attention
Figure 3 for Towards Unified Multi-granularity Text Detection with Interactive Attention
Figure 4 for Towards Unified Multi-granularity Text Detection with Interactive Attention
Viaarxiv icon

Auxiliary Loss Adaptation for Image Inpainting

Add code
Nov 22, 2021
Figure 1 for Auxiliary Loss Adaptation for Image Inpainting
Figure 2 for Auxiliary Loss Adaptation for Image Inpainting
Figure 3 for Auxiliary Loss Adaptation for Image Inpainting
Figure 4 for Auxiliary Loss Adaptation for Image Inpainting
Viaarxiv icon

Teacher-Student Asynchronous Learning with Multi-Source Consistency for Facial Landmark Detection

Add code
Dec 12, 2020
Figure 1 for Teacher-Student Asynchronous Learning with Multi-Source Consistency for Facial Landmark Detection
Figure 2 for Teacher-Student Asynchronous Learning with Multi-Source Consistency for Facial Landmark Detection
Figure 3 for Teacher-Student Asynchronous Learning with Multi-Source Consistency for Facial Landmark Detection
Figure 4 for Teacher-Student Asynchronous Learning with Multi-Source Consistency for Facial Landmark Detection
Viaarxiv icon

End-to-End Multi-Object Tracking with Global Response Map

Add code
Jul 13, 2020
Figure 1 for End-to-End Multi-Object Tracking with Global Response Map
Figure 2 for End-to-End Multi-Object Tracking with Global Response Map
Figure 3 for End-to-End Multi-Object Tracking with Global Response Map
Figure 4 for End-to-End Multi-Object Tracking with Global Response Map
Viaarxiv icon