Picture for Charbel-Raphaël Segerie

Charbel-Raphaël Segerie

Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods

Add code
May 08, 2025
Viaarxiv icon

BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards

Add code
Jun 03, 2024
Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Add code
Jul 27, 2023
Figure 1 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 2 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 3 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 4 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Viaarxiv icon