Picture for Erik Miehling

Erik Miehling

The Effectiveness of Approximate Regularized Replay for Efficient Supervised Fine-Tuning of Large Language Models

Add code
Dec 26, 2025
Viaarxiv icon

ICX360: In-Context eXplainability 360 Toolkit

Add code
Nov 14, 2025
Viaarxiv icon

Generate, Evaluate, Iterate: Synthetic Data for Human-in-the-Loop Refinement of LLM Judges

Add code
Nov 06, 2025
Viaarxiv icon

Language Models Coupled with Metacognition Can Outperform Reasoning Models

Add code
Aug 25, 2025
Viaarxiv icon

Localizing Persona Representations in LLMs

Add code
May 30, 2025
Viaarxiv icon

Granite Guardian

Add code
Dec 10, 2024
Figure 1 for Granite Guardian
Figure 2 for Granite Guardian
Figure 3 for Granite Guardian
Figure 4 for Granite Guardian
Viaarxiv icon

Evaluating the Prompt Steerability of Large Language Models

Add code
Nov 19, 2024
Figure 1 for Evaluating the Prompt Steerability of Large Language Models
Figure 2 for Evaluating the Prompt Steerability of Large Language Models
Figure 3 for Evaluating the Prompt Steerability of Large Language Models
Figure 4 for Evaluating the Prompt Steerability of Large Language Models
Viaarxiv icon

Programming Refusal with Conditional Activation Steering

Add code
Sep 06, 2024
Figure 1 for Programming Refusal with Conditional Activation Steering
Figure 2 for Programming Refusal with Conditional Activation Steering
Figure 3 for Programming Refusal with Conditional Activation Steering
Figure 4 for Programming Refusal with Conditional Activation Steering
Viaarxiv icon

CELL your Model: Contrastive Explanation Methods for Large Language Models

Add code
Jun 17, 2024
Viaarxiv icon

Language Models in Dialogue: Conversational Maxims for Human-AI Interactions

Add code
Mar 22, 2024
Figure 1 for Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
Figure 2 for Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
Figure 3 for Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
Figure 4 for Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
Viaarxiv icon