Picture for Xiameng Qin

Xiameng Qin

Collaborative Position Reasoning Network for Referring Image Segmentation

Add code
Jan 22, 2024
Viaarxiv icon

MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary

Add code
Jul 24, 2023
Figure 1 for MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary
Figure 2 for MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary
Figure 3 for MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary
Figure 4 for MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary
Viaarxiv icon

TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision

Add code
Jun 06, 2023
Figure 1 for TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
Figure 2 for TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
Figure 3 for TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
Figure 4 for TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
Viaarxiv icon

Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding

Add code
May 19, 2023
Figure 1 for Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
Figure 2 for Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
Figure 3 for Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
Figure 4 for Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
Viaarxiv icon

StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training

Add code
Mar 01, 2023
Figure 1 for StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Figure 2 for StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Figure 3 for StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Figure 4 for StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Viaarxiv icon

Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering

Add code
Dec 14, 2021
Figure 1 for Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Figure 2 for Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Figure 3 for Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Figure 4 for Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Viaarxiv icon

StrucTexT: Structured Text Understanding with Multi-Modal Transformers

Add code
Aug 10, 2021
Figure 1 for StrucTexT: Structured Text Understanding with Multi-Modal Transformers
Figure 2 for StrucTexT: Structured Text Understanding with Multi-Modal Transformers
Figure 3 for StrucTexT: Structured Text Understanding with Multi-Modal Transformers
Figure 4 for StrucTexT: Structured Text Understanding with Multi-Modal Transformers
Viaarxiv icon

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

Add code
Sep 20, 2019
Figure 1 for EATEN: Entity-aware Attention for Single Shot Visual Text Extraction
Figure 2 for EATEN: Entity-aware Attention for Single Shot Visual Text Extraction
Figure 3 for EATEN: Entity-aware Attention for Single Shot Visual Text Extraction
Figure 4 for EATEN: Entity-aware Attention for Single Shot Visual Text Extraction
Viaarxiv icon