Picture for Conghui He

Conghui He

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Add code
Dec 10, 2024
Viaarxiv icon

Chimera: Improving Generalist Model with Domain-Specific Experts

Add code
Dec 08, 2024
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Viaarxiv icon

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

Add code
Dec 03, 2024
Viaarxiv icon

Can LLMs be Good Graph Judger for Knowledge Graph Construction?

Add code
Nov 26, 2024
Figure 1 for Can LLMs be Good Graph Judger for Knowledge Graph Construction?
Figure 2 for Can LLMs be Good Graph Judger for Knowledge Graph Construction?
Figure 3 for Can LLMs be Good Graph Judger for Knowledge Graph Construction?
Figure 4 for Can LLMs be Good Graph Judger for Knowledge Graph Construction?
Viaarxiv icon

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Add code
Oct 29, 2024
Figure 1 for Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
Figure 2 for Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
Figure 3 for Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
Figure 4 for Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
Viaarxiv icon

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Add code
Oct 23, 2024
Figure 1 for MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Figure 2 for MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Figure 3 for MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Figure 4 for MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Viaarxiv icon

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Add code
Oct 22, 2024
Viaarxiv icon

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Add code
Oct 16, 2024
Figure 1 for DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Figure 2 for DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Figure 3 for DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Figure 4 for DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Viaarxiv icon

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Add code
Oct 13, 2024
Figure 1 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 2 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 3 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 4 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Viaarxiv icon