Picture for Jonathan Michala

Jonathan Michala

Three Concrete Challenges and Two Hopes for the Safety of Unsupervised Elicitation

Add code
Feb 23, 2026
Viaarxiv icon

Abstractive Red-Teaming of Language Model Character

Add code
Feb 12, 2026
Viaarxiv icon

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

Add code
Jan 15, 2026
Viaarxiv icon

Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

Add code
Jun 08, 2025
Figure 1 for Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
Figure 2 for Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
Figure 3 for Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
Figure 4 for Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
Viaarxiv icon

Mechanistic Decomposition of Sentence Representations

Add code
Jun 04, 2025
Figure 1 for Mechanistic Decomposition of Sentence Representations
Figure 2 for Mechanistic Decomposition of Sentence Representations
Figure 3 for Mechanistic Decomposition of Sentence Representations
Figure 4 for Mechanistic Decomposition of Sentence Representations
Viaarxiv icon