Picture for Can Huang

Can Huang

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

Add code
Jul 02, 2024
Viaarxiv icon

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

Add code
Jun 03, 2024
Viaarxiv icon

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

Add code
May 20, 2024
Figure 1 for MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Figure 2 for MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Figure 3 for MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Figure 4 for MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Viaarxiv icon

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Add code
Apr 19, 2024
Figure 1 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 2 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 3 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 4 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Viaarxiv icon

PURPLE: Making a Large Language Model a Better SQL Writer

Add code
Mar 29, 2024
Figure 1 for PURPLE: Making a Large Language Model a Better SQL Writer
Figure 2 for PURPLE: Making a Large Language Model a Better SQL Writer
Figure 3 for PURPLE: Making a Large Language Model a Better SQL Writer
Figure 4 for PURPLE: Making a Large Language Model a Better SQL Writer
Viaarxiv icon

Elysium: Exploring Object-level Perception in Videos via MLLM

Add code
Mar 29, 2024
Figure 1 for Elysium: Exploring Object-level Perception in Videos via MLLM
Figure 2 for Elysium: Exploring Object-level Perception in Videos via MLLM
Figure 3 for Elysium: Exploring Object-level Perception in Videos via MLLM
Figure 4 for Elysium: Exploring Object-level Perception in Videos via MLLM
Viaarxiv icon

Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation

Add code
Feb 27, 2024
Figure 1 for Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation
Figure 2 for Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation
Figure 3 for Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation
Figure 4 for Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation
Viaarxiv icon

PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition

Add code
Feb 15, 2024
Figure 1 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Figure 2 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Figure 3 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Figure 4 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Viaarxiv icon

GloTSFormer: Global Video Text Spotting Transformer

Add code
Jan 08, 2024
Viaarxiv icon

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding

Add code
Nov 30, 2023
Figure 1 for DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Figure 2 for DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Figure 3 for DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Figure 4 for DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Viaarxiv icon