Picture for Feiyang Pan

Feiyang Pan

CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion

Add code
Feb 11, 2026
Viaarxiv icon

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

Add code
Feb 11, 2026
Viaarxiv icon

Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning

Add code
May 22, 2025
Viaarxiv icon

Style Miner: Find Significant and Stable Explanatory Factors in Time Series with Constrained Reinforcement Learning

Add code
Mar 21, 2023
Figure 1 for Style Miner: Find Significant and Stable Explanatory Factors in Time Series with Constrained Reinforcement Learning
Figure 2 for Style Miner: Find Significant and Stable Explanatory Factors in Time Series with Constrained Reinforcement Learning
Figure 3 for Style Miner: Find Significant and Stable Explanatory Factors in Time Series with Constrained Reinforcement Learning
Figure 4 for Style Miner: Find Significant and Stable Explanatory Factors in Time Series with Constrained Reinforcement Learning
Viaarxiv icon

Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution

Add code
Jul 22, 2022
Figure 1 for Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution
Figure 2 for Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution
Figure 3 for Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution
Figure 4 for Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution
Viaarxiv icon

Follow the Prophet: Accurate Online Conversion Rate Prediction in the Face of Delayed Feedback

Add code
Aug 13, 2021
Figure 1 for Follow the Prophet: Accurate Online Conversion Rate Prediction in the Face of Delayed Feedback
Figure 2 for Follow the Prophet: Accurate Online Conversion Rate Prediction in the Face of Delayed Feedback
Viaarxiv icon

GuideBoot: Guided Bootstrap for Deep Contextual Bandits

Add code
Jul 18, 2021
Figure 1 for GuideBoot: Guided Bootstrap for Deep Contextual Bandits
Figure 2 for GuideBoot: Guided Bootstrap for Deep Contextual Bandits
Figure 3 for GuideBoot: Guided Bootstrap for Deep Contextual Bandits
Figure 4 for GuideBoot: Guided Bootstrap for Deep Contextual Bandits
Viaarxiv icon

Trust the Model When It Is Confident: Masked Model-based Actor-Critic

Add code
Oct 10, 2020
Figure 1 for Trust the Model When It Is Confident: Masked Model-based Actor-Critic
Figure 2 for Trust the Model When It Is Confident: Masked Model-based Actor-Critic
Figure 3 for Trust the Model When It Is Confident: Masked Model-based Actor-Critic
Figure 4 for Trust the Model When It Is Confident: Masked Model-based Actor-Critic
Viaarxiv icon

GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning

Add code
May 26, 2020
Figure 1 for GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning
Figure 2 for GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning
Figure 3 for GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning
Viaarxiv icon

Towards reliable and fair probabilistic predictions: field-aware calibration with neural networks

Add code
May 28, 2019
Figure 1 for Towards reliable and fair probabilistic predictions: field-aware calibration with neural networks
Figure 2 for Towards reliable and fair probabilistic predictions: field-aware calibration with neural networks
Figure 3 for Towards reliable and fair probabilistic predictions: field-aware calibration with neural networks
Figure 4 for Towards reliable and fair probabilistic predictions: field-aware calibration with neural networks
Viaarxiv icon