Picture for Zhijiang Guo

Zhijiang Guo

Activation-Guided Consensus Merging for Large Language Models

Add code
May 20, 2025
Viaarxiv icon

EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code

Add code
May 19, 2025
Viaarxiv icon

TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios

Add code
May 19, 2025
Viaarxiv icon

From System 1 to System 2: A Survey of Reasoning Large Language Models

Add code
Feb 25, 2025
Viaarxiv icon

RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems?

Add code
Jan 20, 2025
Viaarxiv icon

The Automated Verification of Textual Claims (AVeriTeC) Shared Task

Add code
Oct 31, 2024
Figure 1 for The Automated Verification of Textual Claims (AVeriTeC) Shared Task
Figure 2 for The Automated Verification of Textual Claims (AVeriTeC) Shared Task
Figure 3 for The Automated Verification of Textual Claims (AVeriTeC) Shared Task
Figure 4 for The Automated Verification of Textual Claims (AVeriTeC) Shared Task
Viaarxiv icon

Long$^2$RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall

Add code
Oct 31, 2024
Viaarxiv icon

FormalAlign: Automated Alignment Evaluation for Autoformalization

Add code
Oct 14, 2024
Viaarxiv icon

Effi-Code: Unleashing Code Efficiency in Language Models

Add code
Oct 14, 2024
Viaarxiv icon

Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models

Add code
Oct 05, 2024
Figure 1 for Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models
Figure 2 for Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models
Figure 3 for Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models
Figure 4 for Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models
Viaarxiv icon