Picture for Yong Yu

Yong Yu

Plan-MCTS: Plan Exploration for Action Exploitation in Web Navigation

Add code
Feb 15, 2026
Viaarxiv icon

LogitsCoder: Towards Efficient Chain-of-Thought Path Search via Logits Preference Decoding for Code Generation

Add code
Feb 15, 2026
Viaarxiv icon

Adaptive Milestone Reward for GUI Agents

Add code
Feb 12, 2026
Viaarxiv icon

ReMiT: RL-Guided Mid-Training for Iterative LLM Evolution

Add code
Feb 03, 2026
Viaarxiv icon

UniCon: A Unified System for Efficient Robot Learning Transfers

Add code
Jan 21, 2026
Viaarxiv icon

LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls

Add code
Nov 18, 2025
Figure 1 for LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls
Figure 2 for LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls
Figure 3 for LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls
Figure 4 for LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls
Viaarxiv icon

Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering

Add code
Oct 31, 2025
Viaarxiv icon

CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions

Add code
Oct 30, 2025
Figure 1 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Figure 2 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Figure 3 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Figure 4 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Viaarxiv icon

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

Add code
Oct 09, 2025
Figure 1 for A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models
Figure 2 for A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models
Figure 3 for A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models
Viaarxiv icon

Fast, Slow, and Tool-augmented Thinking for LLMs: A Review

Add code
Aug 17, 2025
Viaarxiv icon