Picture for Swarnadeep Saha

Swarnadeep Saha

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Add code
Oct 08, 2025
Figure 1 for Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Figure 2 for Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Figure 3 for Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Figure 4 for Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Viaarxiv icon

OptimalThinkingBench: Evaluating Over and Underthinking in LLMs

Add code
Aug 18, 2025
Viaarxiv icon

Bridging Offline and Online Reinforcement Learning for LLMs

Add code
Jun 26, 2025
Viaarxiv icon

J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning

Add code
May 15, 2025
Viaarxiv icon

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge

Add code
Jan 30, 2025
Figure 1 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Figure 2 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Figure 3 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Figure 4 for Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Viaarxiv icon

MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning

Add code
Sep 18, 2024
Figure 1 for MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
Figure 2 for MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
Figure 3 for MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
Figure 4 for MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
Viaarxiv icon

System-1.x: Learning to Balance Fast and Slow Planning with Language Models

Add code
Jul 19, 2024
Figure 1 for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Figure 2 for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Figure 3 for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Figure 4 for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Viaarxiv icon

MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models

Add code
Feb 02, 2024
Viaarxiv icon

Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

Add code
Oct 23, 2023
Figure 1 for Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Figure 2 for Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Figure 3 for Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Figure 4 for Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Viaarxiv icon

ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs

Add code
Sep 22, 2023
Figure 1 for ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
Figure 2 for ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
Figure 3 for ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
Figure 4 for ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
Viaarxiv icon