Picture for Xuehai Pan

Xuehai Pan

Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective

Add code
Feb 20, 2024
Viaarxiv icon

Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction

Add code
Feb 06, 2024
Figure 1 for Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction
Figure 2 for Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction
Figure 3 for Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction
Figure 4 for Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction
Viaarxiv icon

AI Alignment: A Comprehensive Survey

Add code
Nov 01, 2023
Viaarxiv icon

Safe RLHF: Safe Reinforcement Learning from Human Feedback

Add code
Oct 19, 2023
Figure 1 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Figure 2 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Figure 3 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Figure 4 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Viaarxiv icon

Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

Add code
Oct 19, 2023
Figure 1 for Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Figure 2 for Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Figure 3 for Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Figure 4 for Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Viaarxiv icon

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models

Add code
Oct 10, 2023
Figure 1 for Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models
Figure 2 for Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models
Figure 3 for Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models
Figure 4 for Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models
Viaarxiv icon

Baichuan 2: Open Large-scale Language Models

Add code
Sep 20, 2023
Figure 1 for Baichuan 2: Open Large-scale Language Models
Figure 2 for Baichuan 2: Open Large-scale Language Models
Figure 3 for Baichuan 2: Open Large-scale Language Models
Figure 4 for Baichuan 2: Open Large-scale Language Models
Viaarxiv icon

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset

Add code
Jul 10, 2023
Figure 1 for BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Figure 2 for BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Figure 3 for BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Figure 4 for BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Viaarxiv icon

OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research

Add code
May 16, 2023
Figure 1 for OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research
Figure 2 for OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research
Figure 3 for OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research
Figure 4 for OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research
Viaarxiv icon

Proactive Multi-Camera Collaboration For 3D Human Pose Estimation

Add code
Mar 07, 2023
Figure 1 for Proactive Multi-Camera Collaboration For 3D Human Pose Estimation
Figure 2 for Proactive Multi-Camera Collaboration For 3D Human Pose Estimation
Figure 3 for Proactive Multi-Camera Collaboration For 3D Human Pose Estimation
Figure 4 for Proactive Multi-Camera Collaboration For 3D Human Pose Estimation
Viaarxiv icon