Picture for Huanqian Wang

Huanqian Wang

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Add code
Jul 11, 2024
Viaarxiv icon

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

Add code
Sep 04, 2023
Figure 1 for Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning
Figure 2 for Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning
Figure 3 for Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning
Figure 4 for Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning
Viaarxiv icon