Sonnet Generation


Learning to Inject: Automated Prompt Injection via Reinforcement Learning

Add code
Feb 05, 2026
Viaarxiv icon

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

Add code
Feb 05, 2026
Viaarxiv icon

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases

Add code
Feb 04, 2026
Viaarxiv icon

Understanding LLM Evaluator Behavior: A Structured Multi-Evaluator Framework for Merchant Risk Assessment

Add code
Feb 04, 2026
Viaarxiv icon

Clarify Before You Draw: Proactive Agents for Robust Text-to-CAD Generation

Add code
Feb 03, 2026
Viaarxiv icon

Qualitative Evaluation of LLM-Designed GUI

Add code
Jan 30, 2026
Viaarxiv icon

ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems

Add code
Feb 02, 2026
Viaarxiv icon

On the Paradoxical Interference between Instruction-Following and Task Solving

Add code
Jan 29, 2026
Viaarxiv icon

Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs

Add code
Jan 28, 2026
Viaarxiv icon

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Add code
Jan 28, 2026
Viaarxiv icon