Picture for Mengdi Wang

Mengdi Wang

Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning

Add code
Feb 16, 2024
Viaarxiv icon

MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences

Add code
Feb 14, 2024
Viaarxiv icon

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Add code
Feb 07, 2024
Figure 1 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 2 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 3 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 4 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Viaarxiv icon

Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy

Add code
Feb 06, 2024
Viaarxiv icon

TurboSVM-FL: Boosting Federated Learning through SVM Aggregation for Lazy Clients

Add code
Jan 29, 2024
Viaarxiv icon

Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

Add code
Jan 08, 2024
Figure 1 for Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization
Figure 2 for Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization
Figure 3 for Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization
Figure 4 for Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization
Viaarxiv icon

Scalable Normalizing Flows Enable Boltzmann Generators for Macromolecules

Add code
Jan 08, 2024
Viaarxiv icon

Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning?

Add code
Nov 29, 2023
Viaarxiv icon

Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation

Add code
Nov 04, 2023
Figure 1 for Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation
Figure 2 for Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation
Figure 3 for Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation
Figure 4 for Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation
Viaarxiv icon

Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks

Add code
Oct 16, 2023
Figure 1 for Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks
Viaarxiv icon