Picture for Miao Lu

Miao Lu

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Add code
May 26, 2024
Figure 1 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 2 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 3 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 4 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Viaarxiv icon

Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm

Add code
Apr 04, 2024
Figure 1 for Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm
Figure 2 for Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm
Viaarxiv icon

Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates

Add code
Oct 26, 2023
Figure 1 for Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates
Figure 2 for Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates
Figure 3 for Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates
Viaarxiv icon

One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration

Add code
May 29, 2023
Figure 1 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Figure 2 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Figure 3 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Figure 4 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Viaarxiv icon

Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

Add code
May 16, 2023
Figure 1 for Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
Viaarxiv icon

Robust Consensus Clustering and its Applications for Advertising Forecasting

Add code
Dec 27, 2022
Figure 1 for Robust Consensus Clustering and its Applications for Advertising Forecasting
Figure 2 for Robust Consensus Clustering and its Applications for Advertising Forecasting
Figure 3 for Robust Consensus Clustering and its Applications for Advertising Forecasting
Figure 4 for Robust Consensus Clustering and its Applications for Advertising Forecasting
Viaarxiv icon

Video Background Music Generation: Dataset, Method and Evaluation

Add code
Nov 21, 2022
Figure 1 for Video Background Music Generation: Dataset, Method and Evaluation
Figure 2 for Video Background Music Generation: Dataset, Method and Evaluation
Figure 3 for Video Background Music Generation: Dataset, Method and Evaluation
Figure 4 for Video Background Music Generation: Dataset, Method and Evaluation
Viaarxiv icon

Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach

Add code
Sep 12, 2022
Viaarxiv icon

Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

Add code
May 26, 2022
Figure 1 for Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes
Figure 2 for Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes
Figure 3 for Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes
Figure 4 for Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes
Viaarxiv icon

GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection

Add code
Apr 14, 2022
Figure 1 for GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
Figure 2 for GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
Figure 3 for GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
Figure 4 for GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
Viaarxiv icon