Picture for Alexander Bukharin

Alexander Bukharin

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Add code
May 16, 2025
Viaarxiv icon

Llama-Nemotron: Efficient Reasoning Models

Add code
May 02, 2025
Viaarxiv icon

Adversarial Training of Reward Models

Add code
Apr 08, 2025
Viaarxiv icon

HelpSteer2-Preference: Complementing Ratings with Preferences

Add code
Oct 02, 2024
Viaarxiv icon

Robust Reinforcement Learning from Corrupted Human Feedback

Add code
Jun 21, 2024
Viaarxiv icon

Adaptive Preference Scaling for Reinforcement Learning with Human Feedback

Add code
Jun 04, 2024
Viaarxiv icon

Data Diversity Matters for Robust Instruction Tuning

Add code
Nov 21, 2023
Viaarxiv icon

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

Add code
Oct 16, 2023
Viaarxiv icon

Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

Add code
Sep 06, 2023
Viaarxiv icon

Machine Learning Force Fields with Data Cost Aware Training

Add code
Jun 05, 2023
Figure 1 for Machine Learning Force Fields with Data Cost Aware Training
Figure 2 for Machine Learning Force Fields with Data Cost Aware Training
Figure 3 for Machine Learning Force Fields with Data Cost Aware Training
Figure 4 for Machine Learning Force Fields with Data Cost Aware Training
Viaarxiv icon