Picture for Greg Durrett

Greg Durrett

Causal Graph based Event Reasoning using Semantic Relation Experts

Add code
Jun 07, 2025
Viaarxiv icon

OpenThoughts: Data Recipes for Reasoning Models

Add code
Jun 05, 2025
Viaarxiv icon

SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat

Add code
Jun 05, 2025
Viaarxiv icon

AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy

Add code
May 28, 2025
Viaarxiv icon

Learning Composable Chains-of-Thought

Add code
May 28, 2025
Viaarxiv icon

CLEVER: A Curated Benchmark for Formally Verified Code Generation

Add code
May 21, 2025
Viaarxiv icon

ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

Add code
May 19, 2025
Viaarxiv icon

EvalAgent: Discovering Implicit Evaluation Criteria from the Web

Add code
Apr 21, 2025
Viaarxiv icon

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

Add code
Apr 21, 2025
Viaarxiv icon

Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation

Add code
Apr 20, 2025
Viaarxiv icon