Picture for Luckeciano C. Melo

Luckeciano C. Melo

Iterative Deployment Improves Planning Skills in LLMs

Add code
Dec 31, 2025
Viaarxiv icon

Learning Without Critics? Revisiting GRPO in Classical Reinforcement Learning Environments

Add code
Nov 05, 2025
Viaarxiv icon

Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning

Add code
Oct 01, 2025
Viaarxiv icon

InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context

Add code
Feb 17, 2025
Figure 1 for InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
Figure 2 for InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
Figure 3 for InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
Figure 4 for InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
Viaarxiv icon

Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Add code
Oct 17, 2024
Figure 1 for Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning
Figure 2 for Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning
Figure 3 for Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning
Figure 4 for Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning
Viaarxiv icon

Temporal-Difference Variational Continual Learning

Add code
Oct 10, 2024
Figure 1 for Temporal-Difference Variational Continual Learning
Figure 2 for Temporal-Difference Variational Continual Learning
Figure 3 for Temporal-Difference Variational Continual Learning
Figure 4 for Temporal-Difference Variational Continual Learning
Viaarxiv icon

Deep Bayesian Active Learning for Preference Modeling in Large Language Models

Add code
Jun 14, 2024
Figure 1 for Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Figure 2 for Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Figure 3 for Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Figure 4 for Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Viaarxiv icon

Transformers are Meta-Reinforcement Learners

Add code
Jun 14, 2022
Figure 1 for Transformers are Meta-Reinforcement Learners
Figure 2 for Transformers are Meta-Reinforcement Learners
Figure 3 for Transformers are Meta-Reinforcement Learners
Figure 4 for Transformers are Meta-Reinforcement Learners
Viaarxiv icon

MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces

Add code
Sep 30, 2020
Figure 1 for MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces
Figure 2 for MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces
Figure 3 for MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces
Figure 4 for MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces
Viaarxiv icon

Bottom-Up Meta-Policy Search

Add code
Oct 22, 2019
Figure 1 for Bottom-Up Meta-Policy Search
Figure 2 for Bottom-Up Meta-Policy Search
Figure 3 for Bottom-Up Meta-Policy Search
Figure 4 for Bottom-Up Meta-Policy Search
Viaarxiv icon