Picture for Zheng Yuan

Zheng Yuan

Istituto Italiano di Tecnologia, Italy, Università di Ferrara, Italy

Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay Assessment

Add code
Sep 19, 2025
Figure 1 for Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay Assessment
Figure 2 for Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay Assessment
Figure 3 for Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay Assessment
Viaarxiv icon

You Don't Need Pre-built Graphs for RAG: Retrieval Augmented Generation with Adaptive Reasoning Structures

Add code
Aug 08, 2025
Viaarxiv icon

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Add code
Aug 01, 2025
Viaarxiv icon

RESAR-BEV: An Explainable Progressive Residual Autoregressive Approach for Camera-Radar Fusion in BEV Segmentation

Add code
May 10, 2025
Figure 1 for RESAR-BEV: An Explainable Progressive Residual Autoregressive Approach for Camera-Radar Fusion in BEV Segmentation
Figure 2 for RESAR-BEV: An Explainable Progressive Residual Autoregressive Approach for Camera-Radar Fusion in BEV Segmentation
Figure 3 for RESAR-BEV: An Explainable Progressive Residual Autoregressive Approach for Camera-Radar Fusion in BEV Segmentation
Figure 4 for RESAR-BEV: An Explainable Progressive Residual Autoregressive Approach for Camera-Radar Fusion in BEV Segmentation
Viaarxiv icon

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

Add code
May 05, 2025
Figure 1 for FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Figure 2 for FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Figure 3 for FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Figure 4 for FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Viaarxiv icon

Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset

Add code
Mar 31, 2025
Figure 1 for Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset
Figure 2 for Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset
Figure 3 for Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset
Figure 4 for Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset
Viaarxiv icon

REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models

Add code
Mar 20, 2025
Figure 1 for REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
Figure 2 for REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
Figure 3 for REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
Figure 4 for REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
Viaarxiv icon

Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of Summarization

Add code
Mar 03, 2025
Viaarxiv icon

Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases

Add code
Feb 20, 2025
Viaarxiv icon

Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation

Add code
Feb 18, 2025
Figure 1 for Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation
Figure 2 for Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation
Figure 3 for Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation
Figure 4 for Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation
Viaarxiv icon