Picture for Dong Yu

Dong Yu

MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

Add code
May 29, 2024
Figure 1 for MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
Figure 2 for MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
Figure 3 for MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
Figure 4 for MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
Viaarxiv icon

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Add code
Apr 18, 2024
Figure 1 for Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Figure 2 for Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Figure 3 for Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Figure 4 for Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Viaarxiv icon

Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models

Add code
Apr 14, 2024
Figure 1 for Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models
Figure 2 for Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models
Figure 3 for Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models
Figure 4 for Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models
Viaarxiv icon

Polarity Calibration for Opinion Summarization

Add code
Apr 02, 2024
Figure 1 for Polarity Calibration for Opinion Summarization
Figure 2 for Polarity Calibration for Opinion Summarization
Figure 3 for Polarity Calibration for Opinion Summarization
Figure 4 for Polarity Calibration for Opinion Summarization
Viaarxiv icon

Conceptual and Unbiased Reasoning in Language Models

Add code
Mar 30, 2024
Viaarxiv icon

Self-Consistency Boosts Calibration for Math Reasoning

Add code
Mar 14, 2024
Viaarxiv icon

A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation

Add code
Mar 06, 2024
Figure 1 for A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation
Figure 2 for A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation
Figure 3 for A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation
Figure 4 for A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation
Viaarxiv icon

Can Large Language Models do Analytical Reasoning?

Add code
Mar 06, 2024
Figure 1 for Can Large Language Models do Analytical Reasoning?
Figure 2 for Can Large Language Models do Analytical Reasoning?
Figure 3 for Can Large Language Models do Analytical Reasoning?
Figure 4 for Can Large Language Models do Analytical Reasoning?
Viaarxiv icon

Collaborative decoding of critical tokens for boosting factuality of large language models

Add code
Feb 28, 2024
Viaarxiv icon

Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models

Add code
Feb 27, 2024
Viaarxiv icon