Picture for Akshay Krishnamurthy

Akshay Krishnamurthy

Carnegie Mellon University

Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization

Add code
Jul 18, 2024
Figure 1 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 2 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 3 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Viaarxiv icon

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Add code
Jun 17, 2024
Viaarxiv icon

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

Add code
May 31, 2024
Viaarxiv icon

Rich-Observation Reinforcement Learning with Continuous Latent Dynamics

Add code
May 29, 2024
Viaarxiv icon

Can large language models explore in-context?

Add code
Mar 22, 2024
Figure 1 for Can large language models explore in-context?
Figure 2 for Can large language models explore in-context?
Figure 3 for Can large language models explore in-context?
Figure 4 for Can large language models explore in-context?
Viaarxiv icon

Scalable Online Exploration via Coverability

Add code
Mar 11, 2024
Figure 1 for Scalable Online Exploration via Coverability
Figure 2 for Scalable Online Exploration via Coverability
Figure 3 for Scalable Online Exploration via Coverability
Figure 4 for Scalable Online Exploration via Coverability
Viaarxiv icon

Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning

Add code
Jan 22, 2024
Viaarxiv icon

Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression

Add code
Oct 17, 2023
Viaarxiv icon

Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits

Add code
Jun 13, 2023
Figure 1 for Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits
Figure 2 for Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits
Figure 3 for Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits
Figure 4 for Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits
Viaarxiv icon

Exposing Attention Glitches with Flip-Flop Language Modeling

Add code
Jun 01, 2023
Figure 1 for Exposing Attention Glitches with Flip-Flop Language Modeling
Figure 2 for Exposing Attention Glitches with Flip-Flop Language Modeling
Figure 3 for Exposing Attention Glitches with Flip-Flop Language Modeling
Figure 4 for Exposing Attention Glitches with Flip-Flop Language Modeling
Viaarxiv icon