Picture for Bilgehan Sel

Bilgehan Sel

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning

Add code
Mar 30, 2026
Viaarxiv icon

Reinforcement Learning with Backtracking Feedback

Add code
Feb 09, 2026
Viaarxiv icon

Backtracking for Safety

Add code
Mar 11, 2025
Viaarxiv icon

A CMDP-within-online framework for Meta-Safe Reinforcement Learning

Add code
May 26, 2024
Figure 1 for A CMDP-within-online framework for Meta-Safe Reinforcement Learning
Figure 2 for A CMDP-within-online framework for Meta-Safe Reinforcement Learning
Figure 3 for A CMDP-within-online framework for Meta-Safe Reinforcement Learning
Figure 4 for A CMDP-within-online framework for Meta-Safe Reinforcement Learning
Viaarxiv icon

Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning

Add code
May 26, 2024
Figure 1 for Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning
Figure 2 for Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning
Figure 3 for Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning
Figure 4 for Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning
Viaarxiv icon

Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs

Add code
May 21, 2024
Figure 1 for Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs
Figure 2 for Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs
Figure 3 for Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs
Figure 4 for Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs
Viaarxiv icon

Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation

Add code
May 02, 2024
Figure 1 for Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation
Figure 2 for Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation
Figure 3 for Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation
Figure 4 for Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation
Viaarxiv icon

A Human-on-the-Loop Optimization Autoformalism Approach for Sustainability

Add code
Aug 23, 2023
Viaarxiv icon

Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models

Add code
Aug 20, 2023
Figure 1 for Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
Figure 2 for Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
Figure 3 for Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
Figure 4 for Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
Viaarxiv icon

On Solution Functions of Optimization: Universal Approximation and Covering Number Bounds

Add code
Dec 02, 2022
Figure 1 for On Solution Functions of Optimization: Universal Approximation and Covering Number Bounds
Figure 2 for On Solution Functions of Optimization: Universal Approximation and Covering Number Bounds
Figure 3 for On Solution Functions of Optimization: Universal Approximation and Covering Number Bounds
Figure 4 for On Solution Functions of Optimization: Universal Approximation and Covering Number Bounds
Viaarxiv icon