Picture for Yo-Sub Han

Yo-Sub Han

Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations

Add code
May 11, 2026
Viaarxiv icon

NCO: A Versatile Plug-in for Handling Negative Constraints in Decoding

Add code
May 11, 2026
Viaarxiv icon

CRaFT: Circuit-Guided Refusal Feature Selection via Cross-Layer Transcoders

Add code
Apr 02, 2026
Viaarxiv icon

Steering Language Models Before They Speak: Logit-Level Interventions

Add code
Jan 16, 2026
Viaarxiv icon

How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs

Add code
Jan 07, 2026
Viaarxiv icon

WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking

Add code
Nov 13, 2025
Figure 1 for WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking
Figure 2 for WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking
Figure 3 for WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking
Figure 4 for WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking
Viaarxiv icon

AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection

Add code
May 26, 2025
Viaarxiv icon

LogiCase: Effective Test Case Generation from Logical Description in Competitive Programming

Add code
May 21, 2025
Viaarxiv icon

Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code

Add code
Feb 26, 2025
Figure 1 for Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code
Figure 2 for Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code
Figure 3 for Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code
Figure 4 for Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code
Viaarxiv icon

Detection of LLM-Paraphrased Code and Identification of the Responsible LLM Using Coding Style Features

Add code
Feb 25, 2025
Viaarxiv icon