Picture for Jingqun Tang

Jingqun Tang

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

Add code
Jul 02, 2024
Viaarxiv icon

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

Add code
Jun 03, 2024
Viaarxiv icon

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

Add code
May 20, 2024
Figure 1 for MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Figure 2 for MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Figure 3 for MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Figure 4 for MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Viaarxiv icon

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Add code
Apr 19, 2024
Figure 1 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 2 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 3 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 4 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Viaarxiv icon

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Add code
Nov 23, 2023
Figure 1 for Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Figure 2 for Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Figure 3 for Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Figure 4 for Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Viaarxiv icon

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

Add code
Sep 02, 2023
Figure 1 for UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding
Figure 2 for UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding
Figure 3 for UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding
Figure 4 for UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding
Viaarxiv icon

SPTS v2: Single-Point Scene Text Spotting

Add code
Jan 04, 2023
Figure 1 for SPTS v2: Single-Point Scene Text Spotting
Figure 2 for SPTS v2: Single-Point Scene Text Spotting
Figure 3 for SPTS v2: Single-Point Scene Text Spotting
Figure 4 for SPTS v2: Single-Point Scene Text Spotting
Viaarxiv icon

Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning

Add code
Jul 26, 2022
Figure 1 for Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning
Figure 2 for Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning
Figure 3 for Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning
Figure 4 for Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning
Viaarxiv icon

Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection

Add code
Mar 30, 2022
Figure 1 for Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection
Figure 2 for Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection
Figure 3 for Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection
Figure 4 for Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection
Viaarxiv icon