Picture for Hai Ye

Hai Ye

100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

Add code
May 01, 2025
Viaarxiv icon

Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization

Add code
Feb 24, 2025
Figure 1 for Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization
Figure 2 for Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization
Figure 3 for Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization
Figure 4 for Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization
Viaarxiv icon

Test-time Computing: from System-1 Thinking to System-2 Thinking

Add code
Jan 05, 2025
Figure 1 for Test-time Computing: from System-1 Thinking to System-2 Thinking
Figure 2 for Test-time Computing: from System-1 Thinking to System-2 Thinking
Figure 3 for Test-time Computing: from System-1 Thinking to System-2 Thinking
Figure 4 for Test-time Computing: from System-1 Thinking to System-2 Thinking
Viaarxiv icon

Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration

Add code
Dec 22, 2024
Viaarxiv icon

Self-Judge: Selective Instruction Following with Alignment Self-Evaluation

Add code
Sep 02, 2024
Viaarxiv icon

Preference-Guided Reflective Sampling for Aligning Language Models

Add code
Aug 22, 2024
Figure 1 for Preference-Guided Reflective Sampling for Aligning Language Models
Figure 2 for Preference-Guided Reflective Sampling for Aligning Language Models
Figure 3 for Preference-Guided Reflective Sampling for Aligning Language Models
Figure 4 for Preference-Guided Reflective Sampling for Aligning Language Models
Viaarxiv icon

On the Robustness of Question Rewriting Systems to Questions of Varying Hardness

Add code
Nov 12, 2023
Figure 1 for On the Robustness of Question Rewriting Systems to Questions of Varying Hardness
Figure 2 for On the Robustness of Question Rewriting Systems to Questions of Varying Hardness
Figure 3 for On the Robustness of Question Rewriting Systems to Questions of Varying Hardness
Figure 4 for On the Robustness of Question Rewriting Systems to Questions of Varying Hardness
Viaarxiv icon

Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering

Add code
Jun 11, 2023
Viaarxiv icon

Test-Time Adaptation with Perturbation Consistency Learning

Add code
Apr 25, 2023
Figure 1 for Test-Time Adaptation with Perturbation Consistency Learning
Figure 2 for Test-Time Adaptation with Perturbation Consistency Learning
Figure 3 for Test-Time Adaptation with Perturbation Consistency Learning
Figure 4 for Test-Time Adaptation with Perturbation Consistency Learning
Viaarxiv icon

Robust Question Answering against Distribution Shifts with Test-Time Adaptation: An Empirical Study

Add code
Feb 09, 2023
Figure 1 for Robust Question Answering against Distribution Shifts with Test-Time Adaptation: An Empirical Study
Figure 2 for Robust Question Answering against Distribution Shifts with Test-Time Adaptation: An Empirical Study
Figure 3 for Robust Question Answering against Distribution Shifts with Test-Time Adaptation: An Empirical Study
Figure 4 for Robust Question Answering against Distribution Shifts with Test-Time Adaptation: An Empirical Study
Viaarxiv icon