Picture for Le Sun

Le Sun

READoc: A Unified Benchmark for Realistic Document Structured Extraction

Add code
Sep 08, 2024
Figure 1 for READoc: A Unified Benchmark for Realistic Document Structured Extraction
Figure 2 for READoc: A Unified Benchmark for Realistic Document Structured Extraction
Figure 3 for READoc: A Unified Benchmark for Realistic Document Structured Extraction
Figure 4 for READoc: A Unified Benchmark for Realistic Document Structured Extraction
Viaarxiv icon

Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic

Add code
Aug 29, 2024
Figure 1 for Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic
Figure 2 for Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic
Figure 3 for Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic
Figure 4 for Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic
Viaarxiv icon

CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution

Add code
Aug 23, 2024
Figure 1 for CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution
Figure 2 for CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution
Figure 3 for CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution
Figure 4 for CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution
Viaarxiv icon

DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation

Add code
Aug 23, 2024
Figure 1 for DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation
Figure 2 for DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation
Figure 3 for DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation
Figure 4 for DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation
Viaarxiv icon

REInstruct: Building Instruction Data from Unlabeled Corpus

Add code
Aug 20, 2024
Figure 1 for REInstruct: Building Instruction Data from Unlabeled Corpus
Figure 2 for REInstruct: Building Instruction Data from Unlabeled Corpus
Figure 3 for REInstruct: Building Instruction Data from Unlabeled Corpus
Figure 4 for REInstruct: Building Instruction Data from Unlabeled Corpus
Viaarxiv icon

StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation

Add code
Aug 07, 2024
Viaarxiv icon

Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models

Add code
Jul 16, 2024
Figure 1 for Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models
Figure 2 for Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models
Figure 3 for Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models
Figure 4 for Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models
Viaarxiv icon

On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation

Add code
Jun 18, 2024
Figure 1 for On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation
Figure 2 for On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation
Figure 3 for On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation
Figure 4 for On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation
Viaarxiv icon

Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors

Add code
Jun 13, 2024
Figure 1 for Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors
Figure 2 for Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors
Figure 3 for Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors
Figure 4 for Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors
Viaarxiv icon

Open Grounded Planning: Challenges and Benchmark Construction

Add code
Jun 05, 2024
Viaarxiv icon