Picture for Ziniu Li

Ziniu Li

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning

Add code
Jan 13, 2026
Viaarxiv icon

Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements

Add code
Dec 31, 2025
Viaarxiv icon

A Note on Hybrid Online Reinforcement and Imitation Learning for LLMs: Formulations and Algorithms

Add code
Dec 28, 2025
Viaarxiv icon

Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning

Add code
Dec 28, 2025
Viaarxiv icon

Trust Region Masking for Long-Horizon LLM Reinforcement Learning

Add code
Dec 28, 2025
Viaarxiv icon

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Add code
Dec 21, 2025
Figure 1 for Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
Figure 2 for Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
Figure 3 for Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
Figure 4 for Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
Viaarxiv icon

SpeechJudge: Towards Human-Level Judgment for Speech Naturalness

Add code
Nov 11, 2025
Figure 1 for SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
Figure 2 for SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
Figure 3 for SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
Figure 4 for SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
Viaarxiv icon

ORGEval: Graph-Theoretic Evaluation of LLMs in Optimization Modeling

Add code
Oct 31, 2025
Viaarxiv icon

Scaling Latent Reasoning via Looped Language Models

Add code
Oct 29, 2025
Figure 1 for Scaling Latent Reasoning via Looped Language Models
Figure 2 for Scaling Latent Reasoning via Looped Language Models
Figure 3 for Scaling Latent Reasoning via Looped Language Models
Figure 4 for Scaling Latent Reasoning via Looped Language Models
Viaarxiv icon

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Add code
Aug 24, 2025
Figure 1 for TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Figure 2 for TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Figure 3 for TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Figure 4 for TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Viaarxiv icon