Picture for Lianwen Jin

Lianwen Jin

DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding

Add code
Aug 27, 2024
Figure 1 for DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Figure 2 for DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Figure 3 for DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Figure 4 for DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Viaarxiv icon

Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models

Add code
Aug 09, 2024
Figure 1 for Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Figure 2 for Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Figure 3 for Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Figure 4 for Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Viaarxiv icon

LEGO: Self-Supervised Representation Learning for Scene Text Images

Add code
Aug 04, 2024
Figure 1 for LEGO: Self-Supervised Representation Learning for Scene Text Images
Figure 2 for LEGO: Self-Supervised Representation Learning for Scene Text Images
Figure 3 for LEGO: Self-Supervised Representation Learning for Scene Text Images
Figure 4 for LEGO: Self-Supervised Representation Learning for Scene Text Images
Viaarxiv icon

Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping

Add code
Aug 04, 2024
Figure 1 for Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping
Figure 2 for Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping
Figure 3 for Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping
Figure 4 for Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping
Viaarxiv icon

Generalized Tampered Scene Text Detection in the era of Generative AI

Add code
Jul 31, 2024
Figure 1 for Generalized Tampered Scene Text Detection in the era of Generative AI
Figure 2 for Generalized Tampered Scene Text Detection in the era of Generative AI
Figure 3 for Generalized Tampered Scene Text Detection in the era of Generative AI
Figure 4 for Generalized Tampered Scene Text Detection in the era of Generative AI
Viaarxiv icon

TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models

Add code
Jul 04, 2024
Viaarxiv icon

DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming

Add code
Jun 27, 2024
Figure 1 for DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Figure 2 for DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Figure 3 for DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Figure 4 for DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Viaarxiv icon

Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction

Add code
Jun 05, 2024
Figure 1 for Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction
Figure 2 for Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction
Figure 3 for Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction
Figure 4 for Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction
Viaarxiv icon

Deciphering Oracle Bone Language with Diffusion Models

Add code
Jun 02, 2024
Figure 1 for Deciphering Oracle Bone Language with Diffusion Models
Figure 2 for Deciphering Oracle Bone Language with Diffusion Models
Figure 3 for Deciphering Oracle Bone Language with Diffusion Models
Figure 4 for Deciphering Oracle Bone Language with Diffusion Models
Viaarxiv icon

C$^{3}$Bench: A Comprehensive Classical Chinese Understanding Benchmark for Large Language Models

Add code
May 28, 2024
Viaarxiv icon