Alert button
Picture for Radu Marinescu

Radu Marinescu

Alert button

Boosting AND/OR-Based Computational Protein Design: Dynamic Heuristics and Generalizable UFO

Aug 31, 2023
Bobak Pezeshki, Radu Marinescu, Alexander Ihler, Rina Dechter

Figure 1 for Boosting AND/OR-Based Computational Protein Design: Dynamic Heuristics and Generalizable UFO
Figure 2 for Boosting AND/OR-Based Computational Protein Design: Dynamic Heuristics and Generalizable UFO
Figure 3 for Boosting AND/OR-Based Computational Protein Design: Dynamic Heuristics and Generalizable UFO
Figure 4 for Boosting AND/OR-Based Computational Protein Design: Dynamic Heuristics and Generalizable UFO

Scientific computing has experienced a surge empowered by advancements in technologies such as neural networks. However, certain important tasks are less amenable to these technologies, benefiting from innovations to traditional inference schemes. One such task is protein re-design. Recently a new re-design algorithm, AOBB-K*, was introduced and was competitive with state-of-the-art BBK* on small protein re-design problems. However, AOBB-K* did not scale well. In this work we focus on scaling up AOBB-K* and introduce three new versions: AOBB-K*-b (boosted), AOBB-K*-DH (with dynamic heuristics), and AOBB-K*-UFO (with underflow optimization) that significantly enhance scalability.

* PMLR Volume 216: Uncertainty in Artificial Intelligence, 31-4 August 2023, pg. 1662--1672, Pittsburgh, PA, USA  
* In proceedings of the 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023) and published in Proceedings of Machine Learning Research (PMLR) 
Viaarxiv icon

Iterative Reward Shaping using Human Feedback for Correcting Reward Misspecification

Aug 30, 2023
Jasmina Gajcin, James McCarthy, Rahul Nair, Radu Marinescu, Elizabeth Daly, Ivana Dusparic

A well-defined reward function is crucial for successful training of an reinforcement learning (RL) agent. However, defining a suitable reward function is a notoriously challenging task, especially in complex, multi-objective environments. Developers often have to resort to starting with an initial, potentially misspecified reward function, and iteratively adjusting its parameters, based on observed learned behavior. In this work, we aim to automate this process by proposing ITERS, an iterative reward shaping approach using human feedback for mitigating the effects of a misspecified reward function. Our approach allows the user to provide trajectory-level feedback on agent's behavior during training, which can be integrated as a reward shaping signal in the following training iteration. We also allow the user to provide explanations of their feedback, which are used to augment the feedback and reduce user effort and feedback frequency. We evaluate ITERS in three environments and show that it can successfully correct misspecified reward functions.

* 7 pages, 2 figures 
Viaarxiv icon

An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations

May 15, 2023
Achille Fokoue, Ibrahim Abdelaziz, Maxwell Crouse, Shajith Ikbal, Akihiro Kishimoto, Guilherme Lima, Ndivhuwo Makondo, Radu Marinescu

Figure 1 for An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations
Figure 2 for An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations
Figure 3 for An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations
Figure 4 for An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations

Using reinforcement learning for automated theorem proving has recently received much attention. Current approaches use representations of logical statements that often rely on the names used in these statements and, as a result, the models are generally not transferable from one domain to another. The size of these representations and whether to include the whole theory or part of it are other important decisions that affect the performance of these approaches as well as their runtime efficiency. In this paper, we present NIAGRA; an ensemble Name InvAriant Graph RepresentAtion. NIAGRA addresses this problem by using 1) improved Graph Neural Networks for learning name-invariant formula representations that is tailored for their unique characteristics and 2) an efficient ensemble approach for automated theorem proving. Our experimental evaluation shows state-of-the-art performance on multiple datasets from different domains with improvements up to 10% compared to the best learning-based approaches. Furthermore, transfer learning experiments show that our approach significantly outperforms other learning-based approaches by up to 28%.

* Accepted to IJCAI 2023 
Viaarxiv icon

AutoDOViz: Human-Centered Automation for Decision Optimization

Feb 19, 2023
Daniel Karl I. Weidele, Shazia Afzal, Abel N. Valente, Cole Makuch, Owen Cornec, Long Vu, Dharmashankar Subramanian, Werner Geyer, Rahul Nair, Inge Vejsbjerg, Radu Marinescu, Paulito Palmes, Elizabeth M. Daly, Loraine Franke, Daniel Haehn

Figure 1 for AutoDOViz: Human-Centered Automation for Decision Optimization
Figure 2 for AutoDOViz: Human-Centered Automation for Decision Optimization
Figure 3 for AutoDOViz: Human-Centered Automation for Decision Optimization
Figure 4 for AutoDOViz: Human-Centered Automation for Decision Optimization

We present AutoDOViz, an interactive user interface for automated decision optimization (AutoDO) using reinforcement learning (RL). Decision optimization (DO) has classically being practiced by dedicated DO researchers where experts need to spend long periods of time fine tuning a solution through trial-and-error. AutoML pipeline search has sought to make it easier for a data scientist to find the best machine learning pipeline by leveraging automation to search and tune the solution. More recently, these advances have been applied to the domain of AutoDO, with a similar goal to find the best reinforcement learning pipeline through algorithm selection and parameter tuning. However, Decision Optimization requires significantly more complex problem specification when compared to an ML problem. AutoDOViz seeks to lower the barrier of entry for data scientists in problem specification for reinforcement learning problems, leverage the benefits of AutoDO algorithms for RL pipeline search and finally, create visualizations and policy insights in order to facilitate the typical interactive nature when communicating problem formulation and solution proposals between DO experts and domain experts. In this paper, we report our findings from semi-structured expert interviews with DO practitioners as well as business consultants, leading to design requirements for human-centered automation for DO with RL. We evaluate a system implementation with data scientists and find that they are significantly more open to engage in DO after using our proposed solution. AutoDOViz further increases trust in RL agent models and makes the automated training and evaluation process more comprehensible. As shown for other automation in ML tasks, we also conclude automation of RL for DO can benefit from user and vice-versa when the interface promotes human-in-the-loop.

Viaarxiv icon

Boolean Decision Rules for Reinforcement Learning Policy Summarisation

Jul 18, 2022
James McCarthy, Rahul Nair, Elizabeth Daly, Radu Marinescu, Ivana Dusparic

Figure 1 for Boolean Decision Rules for Reinforcement Learning Policy Summarisation
Figure 2 for Boolean Decision Rules for Reinforcement Learning Policy Summarisation
Figure 3 for Boolean Decision Rules for Reinforcement Learning Policy Summarisation
Figure 4 for Boolean Decision Rules for Reinforcement Learning Policy Summarisation

Explainability of Reinforcement Learning (RL) policies remains a challenging research problem, particularly when considering RL in a safety context. Understanding the decisions and intentions of an RL policy offer avenues to incorporate safety into the policy by limiting undesirable actions. We propose the use of a Boolean Decision Rules model to create a post-hoc rule-based summary of an agent's policy. We evaluate our proposed approach using a DQN agent trained on an implementation of a lava gridworld and show that, given a hand-crafted feature representation of this gridworld, simple generalised rules can be created, giving a post-hoc explainable summary of the agent's policy. We discuss possible avenues to introduce safety into a RL agent's policy by using rules generated by this rule-based model as constraints imposed on the agent's policy, as well as discuss how creating simple rule summaries of an agent's policy may help in the debugging process of RL agents.

Viaarxiv icon

Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents

Dec 17, 2021
Jasmina Gajcin, Rahul Nair, Tejaswini Pedapati, Radu Marinescu, Elizabeth Daly, Ivana Dusparic

Figure 1 for Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents
Figure 2 for Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents
Figure 3 for Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents
Figure 4 for Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents

In complex tasks where the reward function is not straightforward and consists of a set of objectives, multiple reinforcement learning (RL) policies that perform task adequately, but employ different strategies can be trained by adjusting the impact of individual objectives on reward function. Understanding the differences in strategies between policies is necessary to enable users to choose between offered policies, and can help developers understand different behaviors that emerge from various reward functions and training hyperparameters in RL systems. In this work we compare behavior of two policies trained on the same task, but with different preferences in objectives. We propose a method for distinguishing between differences in behavior that stem from different abilities from those that are a consequence of opposing preferences of two RL agents. Furthermore, we use only data on preference-based differences in order to generate contrasting explanations about agents' preferences. Finally, we test and evaluate our approach on an autonomous driving task and compare the behavior of a safety-oriented policy and one that prefers speed.

* 7 pages, 3 figures 
Viaarxiv icon

Logical Credal Networks

Sep 25, 2021
Haifeng Qian, Radu Marinescu, Alexander Gray, Debarun Bhattacharjya, Francisco Barahona, Tian Gao, Ryan Riegel, Pravinda Sahu

Figure 1 for Logical Credal Networks
Figure 2 for Logical Credal Networks
Figure 3 for Logical Credal Networks
Figure 4 for Logical Credal Networks

This paper introduces Logical Credal Networks, an expressive probabilistic logic that generalizes many prior models that combine logic and probability. Given imprecise information represented by probability bounds and conditional probability bounds of logic formulas, this logic specifies a set of probability distributions over all interpretations. On the one hand, our approach allows propositional and first-order logic formulas with few restrictions, e.g., without requiring acyclicity. On the other hand, it has a Markov condition similar to Bayesian networks and Markov random fields that is critical in real-world applications. Having both these properties makes this logic unique, and we investigate its performance on maximum a posteriori inference tasks, including solving Mastermind games with uncertainty and detecting credit card fraud. The results show that the proposed method outperforms existing approaches, and its advantage lies in aggregating multiple sources of imprecise information.

Viaarxiv icon

Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization

Jul 14, 2021
Paulito P. Palmes, Akihiro Kishimoto, Radu Marinescu, Parikshit Ram, Elizabeth Daly

Figure 1 for Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization
Figure 2 for Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization
Figure 3 for Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization
Figure 4 for Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization

The pipeline optimization problem in machine learning requires simultaneous optimization of pipeline structures and parameter adaptation of their elements. Having an elegant way to express these structures can help lessen the complexity in the management and analysis of their performances together with the different choices of optimization strategies. With these issues in mind, we created the AutoMLPipeline (AMLP) toolkit which facilitates the creation and evaluation of complex machine learning pipeline structures using simple expressions. We use AMLP to find optimal pipeline signatures, datamine them, and use these datamined features to speed-up learning and prediction. We formulated a two-stage pipeline optimization with surrogate modeling in AMLP which outperforms other AutoML approaches with a 4-hour time budget in less than 5 minutes of AMLP computation time.

Viaarxiv icon

Generating Dialogue Agents via Automated Planning

Feb 02, 2019
Adi Botea, Christian Muise, Shubham Agarwal, Oznur Alkan, Ondrej Bajgar, Elizabeth Daly, Akihiro Kishimoto, Luis Lastras, Radu Marinescu, Josef Ondrej, Pablo Pedemonte, Miroslav Vodolan

Figure 1 for Generating Dialogue Agents via Automated Planning
Figure 2 for Generating Dialogue Agents via Automated Planning
Figure 3 for Generating Dialogue Agents via Automated Planning
Figure 4 for Generating Dialogue Agents via Automated Planning

Dialogue systems have many applications such as customer support or question answering. Typically they have been limited to shallow single turn interactions. However more advanced applications such as career coaching or planning a trip require a much more complex multi-turn dialogue. Current limitations of conversational systems have made it difficult to support applications that require personalization, customization and context dependent interactions. We tackle this challenging problem by using domain-independent AI planning to automatically create dialogue plans, customized to guide a dialogue towards achieving a given goal. The input includes a library of atomic dialogue actions, an initial state of the dialogue, and a goal. Dialogue plans are plugged into a dialogue system capable to orchestrate their execution. Use cases demonstrate the viability of the approach. Our work on dialogue planning has been integrated into a product, and it is in the process of being deployed into another.

* Accepted at the AAAI-2019 DEEP-DIAL workshop 
Viaarxiv icon