Picture for Ted Moskovitz

Ted Moskovitz

What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation

Add code
Apr 10, 2024
Figure 1 for What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Figure 2 for What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Figure 3 for What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Figure 4 for What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Viaarxiv icon

The Transient Nature of Emergent In-Context Learning in Transformers

Add code
Nov 15, 2023
Figure 1 for The Transient Nature of Emergent In-Context Learning in Transformers
Figure 2 for The Transient Nature of Emergent In-Context Learning in Transformers
Figure 3 for The Transient Nature of Emergent In-Context Learning in Transformers
Figure 4 for The Transient Nature of Emergent In-Context Learning in Transformers
Viaarxiv icon

Confronting Reward Model Overoptimization with Constrained RLHF

Add code
Oct 10, 2023
Figure 1 for Confronting Reward Model Overoptimization with Constrained RLHF
Figure 2 for Confronting Reward Model Overoptimization with Constrained RLHF
Figure 3 for Confronting Reward Model Overoptimization with Constrained RLHF
Figure 4 for Confronting Reward Model Overoptimization with Constrained RLHF
Viaarxiv icon

A State Representation for Diminishing Rewards

Add code
Sep 07, 2023
Figure 1 for A State Representation for Diminishing Rewards
Figure 2 for A State Representation for Diminishing Rewards
Figure 3 for A State Representation for Diminishing Rewards
Figure 4 for A State Representation for Diminishing Rewards
Viaarxiv icon

ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs

Add code
Feb 02, 2023
Figure 1 for ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs
Figure 2 for ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs
Figure 3 for ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs
Figure 4 for ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs
Viaarxiv icon

Transfer RL via the Undo Maps Formalism

Add code
Nov 26, 2022
Figure 1 for Transfer RL via the Undo Maps Formalism
Figure 2 for Transfer RL via the Undo Maps Formalism
Figure 3 for Transfer RL via the Undo Maps Formalism
Viaarxiv icon

Minimum Description Length Control

Add code
Jul 24, 2022
Figure 1 for Minimum Description Length Control
Figure 2 for Minimum Description Length Control
Figure 3 for Minimum Description Length Control
Figure 4 for Minimum Description Length Control
Viaarxiv icon

Towards an Understanding of Default Policies in Multitask Policy Optimization

Add code
Nov 06, 2021
Figure 1 for Towards an Understanding of Default Policies in Multitask Policy Optimization
Figure 2 for Towards an Understanding of Default Policies in Multitask Policy Optimization
Figure 3 for Towards an Understanding of Default Policies in Multitask Policy Optimization
Figure 4 for Towards an Understanding of Default Policies in Multitask Policy Optimization
Viaarxiv icon

A First-Occupancy Representation for Reinforcement Learning

Add code
Oct 06, 2021
Figure 1 for A First-Occupancy Representation for Reinforcement Learning
Figure 2 for A First-Occupancy Representation for Reinforcement Learning
Figure 3 for A First-Occupancy Representation for Reinforcement Learning
Figure 4 for A First-Occupancy Representation for Reinforcement Learning
Viaarxiv icon

Deep Reinforcement Learning with Dynamic Optimism

Add code
Feb 09, 2021
Figure 1 for Deep Reinforcement Learning with Dynamic Optimism
Figure 2 for Deep Reinforcement Learning with Dynamic Optimism
Figure 3 for Deep Reinforcement Learning with Dynamic Optimism
Figure 4 for Deep Reinforcement Learning with Dynamic Optimism
Viaarxiv icon