Picture for Zhangyue Yin

Zhangyue Yin

RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization

Add code
Nov 06, 2025
Figure 1 for RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization
Figure 2 for RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization
Figure 3 for RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization
Figure 4 for RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization
Viaarxiv icon

Dynamic and Generalizable Process Reward Modeling

Add code
Jul 23, 2025
Viaarxiv icon

R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning

Add code
May 26, 2025
Figure 1 for R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning
Figure 2 for R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning
Figure 3 for R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning
Figure 4 for R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning
Viaarxiv icon

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Add code
May 26, 2025
Figure 1 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 2 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 3 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 4 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Viaarxiv icon

FamilyTool: A Multi-hop Personalized Tool Use Benchmark

Add code
Apr 09, 2025
Figure 1 for FamilyTool: A Multi-hop Personalized Tool Use Benchmark
Figure 2 for FamilyTool: A Multi-hop Personalized Tool Use Benchmark
Figure 3 for FamilyTool: A Multi-hop Personalized Tool Use Benchmark
Figure 4 for FamilyTool: A Multi-hop Personalized Tool Use Benchmark
Viaarxiv icon

Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?

Add code
Feb 17, 2025
Figure 1 for Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Figure 2 for Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Figure 3 for Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Figure 4 for Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Viaarxiv icon

Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework

Add code
Jan 26, 2025
Figure 1 for Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework
Figure 2 for Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework
Figure 3 for Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework
Figure 4 for Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework
Viaarxiv icon

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks

Add code
Dec 24, 2024
Figure 1 for VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
Figure 2 for VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
Figure 3 for VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
Figure 4 for VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
Viaarxiv icon

Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

Add code
Dec 18, 2024
Figure 1 for Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
Figure 2 for Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
Figure 3 for Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
Figure 4 for Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
Viaarxiv icon

Unified Active Retrieval for Retrieval Augmented Generation

Add code
Jun 18, 2024
Figure 1 for Unified Active Retrieval for Retrieval Augmented Generation
Figure 2 for Unified Active Retrieval for Retrieval Augmented Generation
Figure 3 for Unified Active Retrieval for Retrieval Augmented Generation
Figure 4 for Unified Active Retrieval for Retrieval Augmented Generation
Viaarxiv icon