Picture for Tianyi Lin

Tianyi Lin

OpenClaw-Skill: Collective Skill Tree Search for Agentic Large Language Models

Add code
Jun 15, 2026
Viaarxiv icon

Efficient Exploration for Iterative Nash Preference Optimization

Add code
May 31, 2026
Viaarxiv icon

DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

Add code
May 29, 2026
Viaarxiv icon

Scale-Invariant Neural Network Optimization: Norm Geometry and Heavy-Tailed Noise

Add code
May 18, 2026
Viaarxiv icon

How AI Aggregation Affects Knowledge

Add code
Apr 06, 2026
Viaarxiv icon

R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?

Add code
Feb 03, 2026
Viaarxiv icon

Reward-free Alignment for Conflicting Objectives

Add code
Feb 02, 2026
Viaarxiv icon

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Add code
Dec 21, 2025
Figure 1 for Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
Figure 2 for Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
Figure 3 for Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
Figure 4 for Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
Viaarxiv icon

Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO

Add code
May 16, 2025
Viaarxiv icon

ComPO: Preference Alignment via Comparison Oracles

Add code
May 08, 2025
Figure 1 for ComPO: Preference Alignment via Comparison Oracles
Figure 2 for ComPO: Preference Alignment via Comparison Oracles
Figure 3 for ComPO: Preference Alignment via Comparison Oracles
Figure 4 for ComPO: Preference Alignment via Comparison Oracles
Viaarxiv icon