Picture for Nathan Kallus

Nathan Kallus

DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning

Add code
Oct 02, 2025
Figure 1 for DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
Figure 2 for DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
Figure 3 for DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
Figure 4 for DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
Viaarxiv icon

Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting

Add code
Sep 30, 2025
Figure 1 for Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting
Figure 2 for Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting
Figure 3 for Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting
Figure 4 for Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting
Viaarxiv icon

Efficient Adaptive Experimentation with Non-Compliance

Add code
May 23, 2025
Viaarxiv icon

Value-Guided Search for Efficient Chain-of-Thought Reasoning

Add code
May 23, 2025
Figure 1 for Value-Guided Search for Efficient Chain-of-Thought Reasoning
Figure 2 for Value-Guided Search for Efficient Chain-of-Thought Reasoning
Figure 3 for Value-Guided Search for Efficient Chain-of-Thought Reasoning
Figure 4 for Value-Guided Search for Efficient Chain-of-Thought Reasoning
Viaarxiv icon

Nonparametric Instrumental Variable Inference with Many Weak Instruments

Add code
May 12, 2025
Viaarxiv icon

From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System

Add code
Apr 21, 2025
Figure 1 for From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System
Figure 2 for From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System
Figure 3 for From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System
Figure 4 for From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System
Viaarxiv icon

SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

Add code
Mar 17, 2025
Viaarxiv icon

$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training

Add code
Feb 27, 2025
Figure 1 for $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Figure 2 for $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Figure 3 for $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Figure 4 for $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Viaarxiv icon

Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems

Add code
Feb 19, 2025
Figure 1 for Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems
Figure 2 for Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems
Figure 3 for Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems
Figure 4 for Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems
Viaarxiv icon

GST-UNet: Spatiotemporal Causal Inference with Time-Varying Confounders

Add code
Feb 07, 2025
Viaarxiv icon