Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

Picture for A. Rupam Mahmood

An Alternate Policy Gradient Estimator for Softmax Policies

Dec 22, 2021
Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, A. Rupam Mahmood

  Access Paper or Ask Questions

Continual Backprop: Stochastic Gradient Descent with Persistent Randomness

Aug 13, 2021
Shibhansh Dohare, A. Rupam Mahmood, Richard S. Sutton

  Access Paper or Ask Questions

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Jul 17, 2021
Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White

* Submitted to JMLR 

  Access Paper or Ask Questions

Analyzing Neural Jacobian Methods in Applications of Visual Servoing and Kinematic Control

Jun 10, 2021
Michael Przystupa, Masood Dehghan, Martin Jagersand, A. Rupam Mahmood

* 8 pages, 6 Figures, 

  Access Paper or Ask Questions

Model-free Policy Learning with Reward Gradients

Mar 09, 2021
Qingfeng Lan, A. Rupam Mahmood

  Access Paper or Ask Questions

Autoregressive Policies for Continuous Control Deep Reinforcement Learning

Mar 27, 2019
Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra

* Submitted to 28th International Joint Conference on Artificial Intelligence (IJCAI 2019). Video: Code: 

  Access Paper or Ask Questions

Benchmarking Reinforcement Learning Algorithms on Real-World Robots

Sep 20, 2018
A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, James Bergstra

* Appears in Proceedings of the Second Conference on Robot Learning (CoRL 2018). Companion video at and source code at 

  Access Paper or Ask Questions

Setting up a Reinforcement Learning Task with a Real-World Robot

Mar 19, 2018
A. Rupam Mahmood, Dmytro Korenkevych, Brent J. Komer, James Bergstra

* Submitted to 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 

  Access Paper or Ask Questions

True Online Temporal-Difference Learning

Sep 08, 2016
Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton

* Journal of Machine Learning Research (JMLR), 17(145):1-40, 2016 
* This is the published JMLR version. It is a much improved version. The main changes are: 1) re-structuring of the article; 2) additional analysis on the forward view; 3) empirical comparison of traditional and new forward view; 4) added discussion of other true online papers; 5) updated discussion for non-linear function approximation 

  Access Paper or Ask Questions

Emphatic Temporal-Difference Learning

Jul 06, 2015
A. Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton

* 9 pages, accepted for presentation at European Workshop on Reinforcement Learning 

  Access Paper or Ask Questions

An Empirical Evaluation of True Online TD(位)

Jul 01, 2015
Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton

* European Workshop on Reinforcement Learning (EWRL) 2015 

  Access Paper or Ask Questions

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

Apr 21, 2015
Richard S. Sutton, A. Rupam Mahmood, Martha White

* Journal of Machine Learning Research 17(73): 1-29, 2016 
* 29 pages This is a significant revision based on the first set of reviews. The most important change was to signal early that the main result is about stability, not convergence 

  Access Paper or Ask Questions