Alert button
Picture for Qi Zheng

Qi Zheng

Alert button

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

Add code
Bookmark button
Alert button
Apr 08, 2024
Chuwei Luo, Yufan Shen, Zhaoqing Zhu, Qi Zheng, Zhi Yu, Cong Yao

Viaarxiv icon

LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

Add code
Bookmark button
Alert button
Jan 03, 2024
Rujiao Long, Hangdi Xing, Zhibo Yang, Qi Zheng, Zhi Yu, Cong Yao, Fei Huang

Viaarxiv icon

Vision Grid Transformer for Document Layout Analysis

Add code
Bookmark button
Alert button
Aug 29, 2023
Cheng Da, Chuwei Luo, Qi Zheng, Cong Yao

Figure 1 for Vision Grid Transformer for Document Layout Analysis
Figure 2 for Vision Grid Transformer for Document Layout Analysis
Figure 3 for Vision Grid Transformer for Document Layout Analysis
Figure 4 for Vision Grid Transformer for Document Layout Analysis
Viaarxiv icon

LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition

Add code
Bookmark button
Alert button
Aug 24, 2023
Changxu Cheng, Peng Wang, Cheng Da, Qi Zheng, Cong Yao

Figure 1 for LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
Figure 2 for LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
Figure 3 for LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
Figure 4 for LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
Viaarxiv icon

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

Add code
Bookmark button
Alert button
Apr 21, 2023
Chuwei Luo, Changxu Cheng, Qi Zheng, Cong Yao

Figure 1 for GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
Figure 2 for GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
Figure 3 for GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
Figure 4 for GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
Viaarxiv icon

LORE: Logical Location Regression Network for Table Structure Recognition

Add code
Bookmark button
Alert button
Mar 07, 2023
Hangdi Xing, Feiyu Gao, Rujiao Long, Jiajun Bu, Qi Zheng, Liangcheng Li, Cong Yao, Zhi Yu

Figure 1 for LORE: Logical Location Regression Network for Table Structure Recognition
Figure 2 for LORE: Logical Location Regression Network for Table Structure Recognition
Figure 3 for LORE: Logical Location Regression Network for Table Structure Recognition
Figure 4 for LORE: Logical Location Regression Network for Table Structure Recognition
Viaarxiv icon

ESceme: Vision-and-Language Navigation with Episodic Scene Memory

Add code
Bookmark button
Alert button
Mar 07, 2023
Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, Dacheng Tao

Figure 1 for ESceme: Vision-and-Language Navigation with Episodic Scene Memory
Figure 2 for ESceme: Vision-and-Language Navigation with Episodic Scene Memory
Figure 3 for ESceme: Vision-and-Language Navigation with Episodic Scene Memory
Figure 4 for ESceme: Vision-and-Language Navigation with Episodic Scene Memory
Viaarxiv icon

Cross-Modal Contrastive Learning for Robust Reasoning in VQA

Add code
Bookmark button
Alert button
Nov 21, 2022
Qi Zheng, Chaoyue Wang, Daqing Liu, Dadong Wang, Dacheng Tao

Figure 1 for Cross-Modal Contrastive Learning for Robust Reasoning in VQA
Figure 2 for Cross-Modal Contrastive Learning for Robust Reasoning in VQA
Figure 3 for Cross-Modal Contrastive Learning for Robust Reasoning in VQA
Figure 4 for Cross-Modal Contrastive Learning for Robust Reasoning in VQA
Viaarxiv icon

Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding

Add code
Bookmark button
Alert button
Jun 27, 2022
Chuwei Luo, Guozhi Tang, Qi Zheng, Cong Yao, Lianwen Jin, Chenliang Li, Yang Xue, Luo Si

Figure 1 for Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
Figure 2 for Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
Figure 3 for Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
Figure 4 for Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
Viaarxiv icon