Picture for Jiantao Jiao

Jiantao Jiao

PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost

Add code
Mar 22, 2026
Viaarxiv icon

Towards Anytime-Valid Statistical Watermarking

Add code
Feb 19, 2026
Viaarxiv icon

dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning

Add code
Dec 24, 2025
Viaarxiv icon

GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments

Add code
Sep 26, 2025
Figure 1 for GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
Figure 2 for GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
Figure 3 for GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
Figure 4 for GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
Viaarxiv icon

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

Add code
Jun 12, 2025
Viaarxiv icon

Sample Complexity and Representation Ability of Test-time Scaling Paradigms

Add code
Jun 05, 2025
Viaarxiv icon

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

Add code
May 18, 2025
Viaarxiv icon

How Do LLMs Perform Two-Hop Reasoning in Context?

Add code
Feb 19, 2025
Figure 1 for How Do LLMs Perform Two-Hop Reasoning in Context?
Figure 2 for How Do LLMs Perform Two-Hop Reasoning in Context?
Figure 3 for How Do LLMs Perform Two-Hop Reasoning in Context?
Figure 4 for How Do LLMs Perform Two-Hop Reasoning in Context?
Viaarxiv icon

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

Add code
Feb 05, 2025
Figure 1 for Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
Figure 2 for Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
Figure 3 for Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
Figure 4 for Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
Viaarxiv icon

How to Evaluate Reward Models for RLHF

Add code
Oct 18, 2024
Figure 1 for How to Evaluate Reward Models for RLHF
Figure 2 for How to Evaluate Reward Models for RLHF
Figure 3 for How to Evaluate Reward Models for RLHF
Figure 4 for How to Evaluate Reward Models for RLHF
Viaarxiv icon