Picture for Swarnadeep Saha

Swarnadeep Saha

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Add code
Mar 19, 2026
Viaarxiv icon

Text-to-Stage: Spatial Layouts from Long-form Narratives

Add code
Mar 18, 2026
Viaarxiv icon

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Add code
Oct 08, 2025
Figure 1 for Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Figure 2 for Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Figure 3 for Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Figure 4 for Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Viaarxiv icon

OptimalThinkingBench: Evaluating Over and Underthinking in LLMs

Add code
Aug 18, 2025
Figure 1 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Figure 2 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Figure 3 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Figure 4 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Viaarxiv icon

Bridging Offline and Online Reinforcement Learning for LLMs

Add code
Jun 26, 2025
Viaarxiv icon

J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning

Add code
May 15, 2025
Viaarxiv icon

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge

Add code
Jan 30, 2025
Figure 1 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Figure 2 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Figure 3 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Figure 4 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Viaarxiv icon

MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning

Add code
Sep 18, 2024
Figure 1 for MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
Figure 2 for MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
Figure 3 for MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
Figure 4 for MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
Viaarxiv icon

System-1.x: Learning to Balance Fast and Slow Planning with Language Models

Add code
Jul 19, 2024
Figure 1 for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Figure 2 for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Figure 3 for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Figure 4 for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Viaarxiv icon

MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models

Add code
Feb 02, 2024
Viaarxiv icon