Picture for Seonglae Cho

Seonglae Cho

Control Reinforcement Learning: Interpretable Token-Level Steering of LLMs via Sparse Autoencoder Features

Add code
Feb 12, 2026
Viaarxiv icon

The Confidence Manifold: Geometric Structure of Correctness Representations in Language Models

Add code
Feb 08, 2026
Viaarxiv icon

CorrSteer: Steering Improves Task Performance and Safety in LLMs through Correlation-based Sparse Autoencoder Feature Selection

Add code
Aug 18, 2025
Viaarxiv icon

LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries

Add code
May 13, 2025
Figure 1 for LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries
Figure 2 for LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries
Figure 3 for LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries
Figure 4 for LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries
Viaarxiv icon

RTSUM: Relation Triple-based Interpretable Summarization with Multi-level Salience Visualization

Add code
Oct 21, 2023
Figure 1 for RTSUM: Relation Triple-based Interpretable Summarization with Multi-level Salience Visualization
Figure 2 for RTSUM: Relation Triple-based Interpretable Summarization with Multi-level Salience Visualization
Figure 3 for RTSUM: Relation Triple-based Interpretable Summarization with Multi-level Salience Visualization
Viaarxiv icon