Picture for Pengfei Liu

Pengfei Liu

Progress or Regress? Self-Improvement Reversal in Post-training

Add code
Jul 06, 2024
Viaarxiv icon

FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models

Add code
Jul 01, 2024
Figure 1 for FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models
Figure 2 for FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models
Figure 3 for FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models
Figure 4 for FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models
Viaarxiv icon

OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

Add code
Jun 26, 2024
Figure 1 for OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?
Figure 2 for OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?
Figure 3 for OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?
Figure 4 for OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?
Viaarxiv icon

BeHonest: Benchmarking Honesty of Large Language Models

Add code
Jun 19, 2024
Viaarxiv icon

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Add code
Jun 18, 2024
Figure 1 for OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Figure 2 for OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Figure 3 for OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Figure 4 for OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Viaarxiv icon

MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation

Add code
Jun 09, 2024
Viaarxiv icon

Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization

Add code
Jun 01, 2024
Figure 1 for Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization
Figure 2 for Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization
Figure 3 for Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization
Figure 4 for Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization
Viaarxiv icon

RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models

Add code
May 23, 2024
Figure 1 for RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models
Figure 2 for RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models
Figure 3 for RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models
Figure 4 for RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models
Viaarxiv icon

Benchmarking Benchmark Leakage in Large Language Models

Add code
Apr 29, 2024
Figure 1 for Benchmarking Benchmark Leakage in Large Language Models
Figure 2 for Benchmarking Benchmark Leakage in Large Language Models
Figure 3 for Benchmarking Benchmark Leakage in Large Language Models
Figure 4 for Benchmarking Benchmark Leakage in Large Language Models
Viaarxiv icon

RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance

Add code
Apr 22, 2024
Figure 1 for RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance
Figure 2 for RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance
Figure 3 for RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance
Figure 4 for RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance
Viaarxiv icon