Picture for Jason Weston

Jason Weston

Google

NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks

Add code
Jul 02, 2025
Figure 1 for NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks
Figure 2 for NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks
Figure 3 for NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks
Figure 4 for NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks
Viaarxiv icon

J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning

Add code
May 15, 2025
Viaarxiv icon

Multi-Token Attention

Add code
Apr 01, 2025
Figure 1 for Multi-Token Attention
Figure 2 for Multi-Token Attention
Figure 3 for Multi-Token Attention
Figure 4 for Multi-Token Attention
Viaarxiv icon

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks

Add code
Mar 19, 2025
Figure 1 for SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Figure 2 for SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Figure 3 for SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Figure 4 for SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Viaarxiv icon

LLM Pretraining with Continuous Concepts

Add code
Feb 12, 2025
Figure 1 for LLM Pretraining with Continuous Concepts
Figure 2 for LLM Pretraining with Continuous Concepts
Figure 3 for LLM Pretraining with Continuous Concepts
Figure 4 for LLM Pretraining with Continuous Concepts
Viaarxiv icon

Diverse Preference Optimization

Add code
Jan 31, 2025
Figure 1 for Diverse Preference Optimization
Figure 2 for Diverse Preference Optimization
Figure 3 for Diverse Preference Optimization
Figure 4 for Diverse Preference Optimization
Viaarxiv icon

R.I.P.: Better Models by Survival of the Fittest Prompts

Add code
Jan 30, 2025
Figure 1 for R.I.P.: Better Models by Survival of the Fittest Prompts
Figure 2 for R.I.P.: Better Models by Survival of the Fittest Prompts
Figure 3 for R.I.P.: Better Models by Survival of the Fittest Prompts
Figure 4 for R.I.P.: Better Models by Survival of the Fittest Prompts
Viaarxiv icon

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge

Add code
Jan 30, 2025
Figure 1 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Figure 2 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Figure 3 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Figure 4 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Viaarxiv icon

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Add code
Jan 18, 2025
Viaarxiv icon

Byte Latent Transformer: Patches Scale Better Than Tokens

Add code
Dec 13, 2024
Viaarxiv icon