Sonnet Generation


Can LLMs Simulate Human Behavioral Variability? A Case Study in the Phonemic Fluency Task

Add code
May 22, 2025
Viaarxiv icon

MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks

Add code
May 22, 2025
Viaarxiv icon

SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development

Add code
May 22, 2025
Viaarxiv icon

DSMentor: Enhancing Data Science Agents with Curriculum Learning and Online Knowledge Accumulation

Add code
May 20, 2025
Viaarxiv icon

RAR: Setting Knowledge Tripwires for Retrieval Augmented Rejection

Add code
May 19, 2025
Viaarxiv icon

ACSE-Eval: Can LLMs threat model real-world cloud infrastructure?

Add code
May 16, 2025
Viaarxiv icon

A document processing pipeline for the construction of a dataset for topic modeling based on the judgments of the Italian Supreme Court

Add code
May 13, 2025
Viaarxiv icon

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

Add code
May 15, 2025
Viaarxiv icon

Measuring General Intelligence with Generated Games

Add code
May 12, 2025
Viaarxiv icon

NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context

Add code
May 13, 2025
Viaarxiv icon