Picture for Jack Lanchantin

Jack Lanchantin

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Add code
Mar 19, 2026
Viaarxiv icon

OptimalThinkingBench: Evaluating Over and Underthinking in LLMs

Add code
Aug 18, 2025
Figure 1 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Figure 2 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Figure 3 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Figure 4 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Viaarxiv icon

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks

Add code
Jul 31, 2025
Viaarxiv icon

NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks

Add code
Jul 02, 2025
Figure 1 for NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks
Figure 2 for NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks
Figure 3 for NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks
Figure 4 for NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks
Viaarxiv icon

Bridging Offline and Online Reinforcement Learning for LLMs

Add code
Jun 26, 2025
Viaarxiv icon

LLM Pretraining with Continuous Concepts

Add code
Feb 12, 2025
Figure 1 for LLM Pretraining with Continuous Concepts
Figure 2 for LLM Pretraining with Continuous Concepts
Figure 3 for LLM Pretraining with Continuous Concepts
Figure 4 for LLM Pretraining with Continuous Concepts
Viaarxiv icon

Diverse Preference Optimization

Add code
Jan 31, 2025
Figure 1 for Diverse Preference Optimization
Figure 2 for Diverse Preference Optimization
Figure 3 for Diverse Preference Optimization
Figure 4 for Diverse Preference Optimization
Viaarxiv icon

Adaptive Decoding via Latent Preference Optimization

Add code
Nov 14, 2024
Figure 1 for Adaptive Decoding via Latent Preference Optimization
Figure 2 for Adaptive Decoding via Latent Preference Optimization
Figure 3 for Adaptive Decoding via Latent Preference Optimization
Figure 4 for Adaptive Decoding via Latent Preference Optimization
Viaarxiv icon

TOOLVERIFIER: Generalization to New Tools via Self-Verification

Add code
Feb 21, 2024
Figure 1 for TOOLVERIFIER: Generalization to New Tools via Self-Verification
Figure 2 for TOOLVERIFIER: Generalization to New Tools via Self-Verification
Figure 3 for TOOLVERIFIER: Generalization to New Tools via Self-Verification
Figure 4 for TOOLVERIFIER: Generalization to New Tools via Self-Verification
Viaarxiv icon

A Data Source for Reasoning Embodied Agents

Add code
Sep 14, 2023
Figure 1 for A Data Source for Reasoning Embodied Agents
Figure 2 for A Data Source for Reasoning Embodied Agents
Figure 3 for A Data Source for Reasoning Embodied Agents
Figure 4 for A Data Source for Reasoning Embodied Agents
Viaarxiv icon