Anthropic


Split Personality Training: Revealing Latent Knowledge Through Alternate Personalities

Add code
Feb 05, 2026
Viaarxiv icon

Qualitative Evaluation of LLM-Designed GUI

Add code
Jan 30, 2026
Viaarxiv icon

OpenLearnLM Benchmark: A Unified Framework for Evaluating Knowledge, Skill, and Attitude in Educational Large Language Models

Add code
Jan 20, 2026
Viaarxiv icon

Early Prediction of Type 2 Diabetes Using Multimodal data and Tabular Transformers

Add code
Jan 19, 2026
Viaarxiv icon

DoPE: Decoy Oriented Perturbation Encapsulation Human-Readable, AI-Hostile Documents for Academic Integrity

Add code
Jan 18, 2026
Viaarxiv icon

ARC Prize 2025: Technical Report

Add code
Jan 15, 2026
Viaarxiv icon

Regulatory gray areas of LLM Terms

Add code
Jan 13, 2026
Viaarxiv icon

Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic Tasks

Add code
Jan 09, 2026
Viaarxiv icon

Agentic LLMs as Powerful Deanonymizers: Re-identification of Participants in the Anthropic Interviewer Dataset

Add code
Jan 09, 2026
Viaarxiv icon

When the Coffee Feature Activates on Coffins: An Analysis of Feature Extraction and Steering for Mechanistic Interpretability

Add code
Jan 06, 2026
Viaarxiv icon