Picture for Jingyue Gao

Jingyue Gao

ProCeedRL: Process Critic with Exploratory Demonstration Reinforcement Learning for LLM Agentic Reasoning

Add code
Apr 02, 2026
Viaarxiv icon

LEMUR: Large scale End-to-end MUltimodal Recommendation

Add code
Nov 17, 2025
Viaarxiv icon

MARGE: Improving Math Reasoning for LLMs with Guided Exploration

Add code
May 18, 2025
Figure 1 for MARGE: Improving Math Reasoning for LLMs with Guided Exploration
Figure 2 for MARGE: Improving Math Reasoning for LLMs with Guided Exploration
Figure 3 for MARGE: Improving Math Reasoning for LLMs with Guided Exploration
Figure 4 for MARGE: Improving Math Reasoning for LLMs with Guided Exploration
Viaarxiv icon

Decentralized Motor Skill Learning for Complex Robotic Systems

Add code
Jun 30, 2023
Viaarxiv icon

Rec4Ad: A Free Lunch to Mitigate Sample Selection Bias for Ads CTR Prediction in Taobao

Add code
Jun 06, 2023
Figure 1 for Rec4Ad: A Free Lunch to Mitigate Sample Selection Bias for Ads CTR Prediction in Taobao
Figure 2 for Rec4Ad: A Free Lunch to Mitigate Sample Selection Bias for Ads CTR Prediction in Taobao
Figure 3 for Rec4Ad: A Free Lunch to Mitigate Sample Selection Bias for Ads CTR Prediction in Taobao
Figure 4 for Rec4Ad: A Free Lunch to Mitigate Sample Selection Bias for Ads CTR Prediction in Taobao
Viaarxiv icon

COPR: Consistency-Oriented Pre-Ranking for Online Advertising

Add code
Jun 06, 2023
Figure 1 for COPR: Consistency-Oriented Pre-Ranking for Online Advertising
Figure 2 for COPR: Consistency-Oriented Pre-Ranking for Online Advertising
Figure 3 for COPR: Consistency-Oriented Pre-Ranking for Online Advertising
Figure 4 for COPR: Consistency-Oriented Pre-Ranking for Online Advertising
Viaarxiv icon

Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Add code
Dec 03, 2022
Figure 1 for Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward
Figure 2 for Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward
Figure 3 for Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward
Figure 4 for Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward
Viaarxiv icon

Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model

Add code
Aug 12, 2022
Figure 1 for Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model
Figure 2 for Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model
Figure 3 for Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model
Figure 4 for Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model
Viaarxiv icon

Multi-Label Robust Factorization Autoencoder and its Application in Predicting Drug-Drug Interactions

Add code
Nov 01, 2018
Figure 1 for Multi-Label Robust Factorization Autoencoder and its Application in Predicting Drug-Drug Interactions
Figure 2 for Multi-Label Robust Factorization Autoencoder and its Application in Predicting Drug-Drug Interactions
Figure 3 for Multi-Label Robust Factorization Autoencoder and its Application in Predicting Drug-Drug Interactions
Figure 4 for Multi-Label Robust Factorization Autoencoder and its Application in Predicting Drug-Drug Interactions
Viaarxiv icon

Motif-based Rule Discovery for Predicting Real-valued Time Series

Add code
Dec 02, 2017
Figure 1 for Motif-based Rule Discovery for Predicting Real-valued Time Series
Figure 2 for Motif-based Rule Discovery for Predicting Real-valued Time Series
Figure 3 for Motif-based Rule Discovery for Predicting Real-valued Time Series
Figure 4 for Motif-based Rule Discovery for Predicting Real-valued Time Series
Viaarxiv icon