Picture for Rishabh Agarwal

Rishabh Agarwal

Dima

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Add code
Aug 29, 2024
Figure 1 for Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Figure 2 for Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Figure 3 for Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Figure 4 for Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Viaarxiv icon

Generative Verifiers: Reward Modeling as Next-Token Prediction

Add code
Aug 27, 2024
Figure 1 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Figure 2 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Figure 3 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Figure 4 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Viaarxiv icon

Gemma 2: Improving Open Language Models at a Practical Size

Add code
Aug 02, 2024
Figure 1 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 2 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 3 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 4 for Gemma 2: Improving Open Language Models at a Practical Size
Viaarxiv icon

Don't Throw Away Data: Better Sequence Knowledge Distillation

Add code
Jul 15, 2024
Figure 1 for Don't Throw Away Data: Better Sequence Knowledge Distillation
Figure 2 for Don't Throw Away Data: Better Sequence Knowledge Distillation
Figure 3 for Don't Throw Away Data: Better Sequence Knowledge Distillation
Figure 4 for Don't Throw Away Data: Better Sequence Knowledge Distillation
Viaarxiv icon

On scalable oversight with weak LLMs judging strong LLMs

Add code
Jul 05, 2024
Figure 1 for On scalable oversight with weak LLMs judging strong LLMs
Figure 2 for On scalable oversight with weak LLMs judging strong LLMs
Figure 3 for On scalable oversight with weak LLMs judging strong LLMs
Figure 4 for On scalable oversight with weak LLMs judging strong LLMs
Viaarxiv icon

SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning

Add code
Jun 21, 2024
Figure 1 for SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning
Figure 2 for SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning
Figure 3 for SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning
Figure 4 for SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning
Viaarxiv icon

Many-Shot In-Context Learning

Add code
Apr 17, 2024
Figure 1 for Many-Shot In-Context Learning
Figure 2 for Many-Shot In-Context Learning
Figure 3 for Many-Shot In-Context Learning
Figure 4 for Many-Shot In-Context Learning
Viaarxiv icon

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Add code
Mar 06, 2024
Figure 1 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 2 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 3 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 4 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Viaarxiv icon

Transformers Can Achieve Length Generalization But Not Robustly

Add code
Feb 14, 2024
Figure 1 for Transformers Can Achieve Length Generalization But Not Robustly
Figure 2 for Transformers Can Achieve Length Generalization But Not Robustly
Figure 3 for Transformers Can Achieve Length Generalization But Not Robustly
Figure 4 for Transformers Can Achieve Length Generalization But Not Robustly
Viaarxiv icon

V-STaR: Training Verifiers for Self-Taught Reasoners

Add code
Feb 09, 2024
Figure 1 for V-STaR: Training Verifiers for Self-Taught Reasoners
Figure 2 for V-STaR: Training Verifiers for Self-Taught Reasoners
Figure 3 for V-STaR: Training Verifiers for Self-Taught Reasoners
Figure 4 for V-STaR: Training Verifiers for Self-Taught Reasoners
Viaarxiv icon