Picture for Lingyu Kong

Lingyu Kong

Clapper: Compact Learning and Video Representation in VLMs

Add code
May 21, 2025
Viaarxiv icon

MatterTune: An Integrated, User-Friendly Platform for Fine-Tuning Atomistic Foundation Models to Accelerate Materials Simulation and Discovery

Add code
Apr 14, 2025
Viaarxiv icon

Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning

Add code
Jan 23, 2025
Figure 1 for Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
Figure 2 for Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
Figure 3 for Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
Figure 4 for Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
Viaarxiv icon

FOCUS: Towards Universal Foreground Segmentation

Add code
Jan 09, 2025
Figure 1 for FOCUS: Towards Universal Foreground Segmentation
Figure 2 for FOCUS: Towards Universal Foreground Segmentation
Figure 3 for FOCUS: Towards Universal Foreground Segmentation
Figure 4 for FOCUS: Towards Universal Foreground Segmentation
Viaarxiv icon

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Add code
Sep 03, 2024
Figure 1 for General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Figure 2 for General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Figure 3 for General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Figure 4 for General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Viaarxiv icon

Focus Anywhere for Fine-grained Multi-page Document Understanding

Add code
May 23, 2024
Figure 1 for Focus Anywhere for Fine-grained Multi-page Document Understanding
Figure 2 for Focus Anywhere for Fine-grained Multi-page Document Understanding
Figure 3 for Focus Anywhere for Fine-grained Multi-page Document Understanding
Figure 4 for Focus Anywhere for Fine-grained Multi-page Document Understanding
Viaarxiv icon

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

Add code
Apr 15, 2024
Figure 1 for OneChart: Purify the Chart Structural Extraction via One Auxiliary Token
Figure 2 for OneChart: Purify the Chart Structural Extraction via One Auxiliary Token
Figure 3 for OneChart: Purify the Chart Structural Extraction via One Auxiliary Token
Figure 4 for OneChart: Purify the Chart Structural Extraction via One Auxiliary Token
Viaarxiv icon

Small Language Model Meets with Reinforced Vision Vocabulary

Add code
Jan 23, 2024
Viaarxiv icon

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Add code
Dec 11, 2023
Figure 1 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Figure 2 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Figure 3 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Figure 4 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Viaarxiv icon

Merlin:Empowering Multimodal LLMs with Foresight Minds

Add code
Nov 30, 2023
Viaarxiv icon