Alert button
Picture for Haitao Mi

Haitao Mi

Alert button

Self-Consistency Boosts Calibration for Math Reasoning

Mar 14, 2024
Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu

Viaarxiv icon

A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation

Mar 06, 2024
Xiangci Li, Linfeng Song, Lifeng Jin, Haitao Mi, Jessica Ouyang, Dong Yu

Figure 1 for A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation
Figure 2 for A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation
Figure 3 for A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation
Figure 4 for A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation
Viaarxiv icon

Collaborative decoding of critical tokens for boosting factuality of large language models

Feb 28, 2024
Lifeng Jin, Baolin Peng, Linfeng Song, Haitao Mi, Ye Tian, Dong Yu

Viaarxiv icon

Fine-Grained Self-Endorsement Improves Factuality and Reasoning

Feb 23, 2024
Ante Wang, Linfeng Song, Baolin Peng, Ye Tian, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu

Viaarxiv icon

Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

Feb 14, 2024
Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Lifeng Jin, Linfeng Song, Haitao Mi, Helen Meng

Viaarxiv icon

Inconsistent dialogue responses and how to recover from them

Jan 18, 2024
Mian Zhang, Lifeng Jin, Linfeng Song, Haitao Mi, Dong Yu

Viaarxiv icon

The Trickle-down Impact of Reward (In-)consistency on RLHF

Sep 28, 2023
Lingfeng Shen, Sihao Chen, Linfeng Song, Lifeng Jin, Baolin Peng, Haitao Mi, Daniel Khashabi, Dong Yu

Figure 1 for The Trickle-down Impact of Reward (In-)consistency on RLHF
Figure 2 for The Trickle-down Impact of Reward (In-)consistency on RLHF
Figure 3 for The Trickle-down Impact of Reward (In-)consistency on RLHF
Figure 4 for The Trickle-down Impact of Reward (In-)consistency on RLHF
Viaarxiv icon

Stabilizing RLHF through Advantage Model and Selective Rehearsal

Sep 18, 2023
Baolin Peng, Linfeng Song, Ye Tian, Lifeng Jin, Haitao Mi, Dong Yu

Figure 1 for Stabilizing RLHF through Advantage Model and Selective Rehearsal
Figure 2 for Stabilizing RLHF through Advantage Model and Selective Rehearsal
Figure 3 for Stabilizing RLHF through Advantage Model and Selective Rehearsal
Figure 4 for Stabilizing RLHF through Advantage Model and Selective Rehearsal
Viaarxiv icon

Search-Engine-augmented Dialogue Response Generation with Cheaply Supervised Query Production

Feb 16, 2023
Ante Wang, Linfeng Song, Qi Liu, Haitao Mi, Longyue Wang, Zhaopeng Tu, Jinsong Su, Dong Yu

Figure 1 for Search-Engine-augmented Dialogue Response Generation with Cheaply Supervised Query Production
Figure 2 for Search-Engine-augmented Dialogue Response Generation with Cheaply Supervised Query Production
Figure 3 for Search-Engine-augmented Dialogue Response Generation with Cheaply Supervised Query Production
Figure 4 for Search-Engine-augmented Dialogue Response Generation with Cheaply Supervised Query Production
Viaarxiv icon

Friend-training: Learning from Models of Different but Related Tasks

Jan 31, 2023
Mian Zhang, Lifeng Jin, Linfeng Song, Haitao Mi, Xiabing Zhou, Dong Yu

Figure 1 for Friend-training: Learning from Models of Different but Related Tasks
Figure 2 for Friend-training: Learning from Models of Different but Related Tasks
Figure 3 for Friend-training: Learning from Models of Different but Related Tasks
Figure 4 for Friend-training: Learning from Models of Different but Related Tasks
Viaarxiv icon