Picture for Sanghyun Hong

Sanghyun Hong

When Can You Poison Rewards? A Tight Characterization of Reward Poisoning in Linear MDPs

Add code
Apr 11, 2026
Viaarxiv icon

Fail-Closed Alignment for Large Language Models

Add code
Feb 19, 2026
Viaarxiv icon

Discovering Universal Activation Directions for PII Leakage in Language Models

Add code
Feb 19, 2026
Viaarxiv icon

Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents

Add code
Aug 23, 2025
Viaarxiv icon

MADCAT: Combating Malware Detection Under Concept Drift with Test-Time Adaptation

Add code
May 24, 2025
Viaarxiv icon

Data Therapist: Eliciting Domain Knowledge from Subject Matter Experts Using Large Language Models

Add code
May 01, 2025
Figure 1 for Data Therapist: Eliciting Domain Knowledge from Subject Matter Experts Using Large Language Models
Figure 2 for Data Therapist: Eliciting Domain Knowledge from Subject Matter Experts Using Large Language Models
Figure 3 for Data Therapist: Eliciting Domain Knowledge from Subject Matter Experts Using Large Language Models
Figure 4 for Data Therapist: Eliciting Domain Knowledge from Subject Matter Experts Using Large Language Models
Viaarxiv icon

Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions

Add code
Apr 02, 2025
Figure 1 for Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions
Figure 2 for Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions
Figure 3 for Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions
Figure 4 for Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions
Viaarxiv icon

Understanding and Mitigating Membership Inference Risks of Neural Ordinary Differential Equations

Add code
Jan 12, 2025
Figure 1 for Understanding and Mitigating Membership Inference Risks of Neural Ordinary Differential Equations
Figure 2 for Understanding and Mitigating Membership Inference Risks of Neural Ordinary Differential Equations
Figure 3 for Understanding and Mitigating Membership Inference Risks of Neural Ordinary Differential Equations
Figure 4 for Understanding and Mitigating Membership Inference Risks of Neural Ordinary Differential Equations
Viaarxiv icon

PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips

Add code
Dec 10, 2024
Figure 1 for PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
Figure 2 for PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
Figure 3 for PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
Figure 4 for PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
Viaarxiv icon

SoK: Watermarking for AI-Generated Content

Add code
Nov 27, 2024
Figure 1 for SoK: Watermarking for AI-Generated Content
Figure 2 for SoK: Watermarking for AI-Generated Content
Viaarxiv icon