Picture for Huawen Shen

Huawen Shen

Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training

Add code
Mar 25, 2026
Viaarxiv icon

MMTIT-Bench: A Multilingual and Multi-Scenario Benchmark with Cognition-Perception-Reasoning Guided Text-Image Machine Translation

Add code
Mar 25, 2026
Viaarxiv icon

Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT

Add code
Mar 10, 2026
Viaarxiv icon

Gather and Trace: Rethinking Video TextVQA from an Instance-oriented Perspective

Add code
Aug 06, 2025
Viaarxiv icon

Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts

Add code
Jun 05, 2025
Figure 1 for Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts
Figure 2 for Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts
Figure 3 for Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts
Figure 4 for Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts
Viaarxiv icon

Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts

Add code
Dec 27, 2024
Figure 1 for Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts
Figure 2 for Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts
Figure 3 for Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts
Figure 4 for Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts
Viaarxiv icon

LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining

Add code
Dec 19, 2024
Viaarxiv icon

Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues

Add code
Dec 17, 2024
Figure 1 for Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues
Figure 2 for Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues
Figure 3 for Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues
Figure 4 for Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues
Viaarxiv icon

Falcon-UI: Understanding GUI Before Following User Instructions

Add code
Dec 12, 2024
Figure 1 for Falcon-UI: Understanding GUI Before Following User Instructions
Figure 2 for Falcon-UI: Understanding GUI Before Following User Instructions
Figure 3 for Falcon-UI: Understanding GUI Before Following User Instructions
Figure 4 for Falcon-UI: Understanding GUI Before Following User Instructions
Viaarxiv icon

Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition

Add code
Jul 09, 2024
Figure 1 for Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition
Figure 2 for Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition
Figure 3 for Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition
Figure 4 for Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition
Viaarxiv icon