Alert button
Picture for Jiaming Ji

Jiaming Ji

Alert button

Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective

Add code
Bookmark button
Alert button
Feb 20, 2024
Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Han Yang, Josef Dai, Xuehai Pan, Yaodong Yang

Viaarxiv icon

Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction

Add code
Bookmark button
Alert button
Feb 06, 2024
Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Yaodong Yang

Viaarxiv icon

AI Alignment: A Comprehensive Survey

Add code
Bookmark button
Alert button
Nov 01, 2023
Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

Viaarxiv icon

Safe RLHF: Safe Reinforcement Learning from Human Feedback

Add code
Bookmark button
Alert button
Oct 19, 2023
Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang

Figure 1 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Figure 2 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Figure 3 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Figure 4 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Viaarxiv icon

Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

Add code
Bookmark button
Alert button
Oct 19, 2023
Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Juntao Dai, Yaodong Yang

Figure 1 for Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Figure 2 for Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Figure 3 for Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Figure 4 for Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Viaarxiv icon

Baichuan 2: Open Large-scale Language Models

Add code
Bookmark button
Alert button
Sep 20, 2023
Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, JunTao Dai, Kun Fang, Lei Su, Liang Song, Lifeng Liu, Liyun Ru, Luyao Ma, Mang Wang, Mickel Liu, MingAn Lin, Nuolan Nie, Peidong Guo, Ruiyang Sun, Tao Zhang, Tianpeng Li, Tianyu Li, Wei Cheng, Weipeng Chen, Xiangrong Zeng, Xiaochuan Wang, Xiaoxi Chen, Xin Men, Xin Yu, Xuehai Pan, Yanjun Shen, Yiding Wang, Yiyu Li, Youxin Jiang, Yuchen Gao, Yupeng Zhang, Zenan Zhou, Zhiying Wu

Figure 1 for Baichuan 2: Open Large-scale Language Models
Figure 2 for Baichuan 2: Open Large-scale Language Models
Figure 3 for Baichuan 2: Open Large-scale Language Models
Figure 4 for Baichuan 2: Open Large-scale Language Models
Viaarxiv icon

Safe DreamerV3: Safe Reinforcement Learning with World Models

Add code
Bookmark button
Alert button
Jul 14, 2023
Weidong Huang, Jiaming Ji, Borong Zhang, Chunhe Xia, Yaodong Yang

Figure 1 for Safe DreamerV3: Safe Reinforcement Learning with World Models
Figure 2 for Safe DreamerV3: Safe Reinforcement Learning with World Models
Figure 3 for Safe DreamerV3: Safe Reinforcement Learning with World Models
Figure 4 for Safe DreamerV3: Safe Reinforcement Learning with World Models
Viaarxiv icon

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset

Add code
Bookmark button
Alert button
Jul 10, 2023
Jiaming Ji, Mickel Liu, Juntao Dai, Xuehai Pan, Chi Zhang, Ce Bian, Chi Zhang, Ruiyang Sun, Yizhou Wang, Yaodong Yang

Figure 1 for BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Figure 2 for BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Figure 3 for BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Figure 4 for BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Viaarxiv icon