Picture for Baolin Peng

Baolin Peng

EJ

Adapting Web Agents with Synthetic Supervision

Add code
Nov 08, 2025
Viaarxiv icon

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Add code
Oct 10, 2025
Figure 1 for Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Figure 2 for Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Figure 3 for Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Figure 4 for Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Viaarxiv icon

Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation

Add code
Jul 09, 2025
Figure 1 for Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
Figure 2 for Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
Figure 3 for Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
Figure 4 for Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
Viaarxiv icon

Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Add code
Apr 30, 2025
Figure 1 for Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Figure 2 for Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Figure 3 for Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Figure 4 for Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Viaarxiv icon

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Add code
Apr 29, 2025
Figure 1 for Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Figure 2 for Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Figure 3 for Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Figure 4 for Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Viaarxiv icon

Magma: A Foundation Model for Multimodal AI Agents

Add code
Feb 18, 2025
Viaarxiv icon

On the Emergence of Thinking in LLMs I: Searching for the Right Intuition

Add code
Feb 10, 2025
Figure 1 for On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
Figure 2 for On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
Figure 3 for On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
Figure 4 for On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
Viaarxiv icon

Teaching AI Agents to Search with Reflective-MCTS and Exploratory Learning

Add code
Oct 15, 2024
Figure 1 for Teaching AI Agents to Search with Reflective-MCTS and Exploratory Learning
Figure 2 for Teaching AI Agents to Search with Reflective-MCTS and Exploratory Learning
Figure 3 for Teaching AI Agents to Search with Reflective-MCTS and Exploratory Learning
Figure 4 for Teaching AI Agents to Search with Reflective-MCTS and Exploratory Learning
Viaarxiv icon

Latent Action Pretraining from Videos

Add code
Oct 15, 2024
Figure 1 for Latent Action Pretraining from Videos
Figure 2 for Latent Action Pretraining from Videos
Figure 3 for Latent Action Pretraining from Videos
Figure 4 for Latent Action Pretraining from Videos
Viaarxiv icon

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning

Add code
Oct 09, 2024
Figure 1 for Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Figure 2 for Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Figure 3 for Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Figure 4 for Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Viaarxiv icon