Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Fernando Acero, Parisa Zehtabi, Nicolas Marchesotti, Michael Cashmore, Daniele Magazzeni, Manuela Veloso

Portfolio optimization involves determining the optimal allocation of portfolio assets in order to maximize a given investment objective. Traditionally, some form of mean-variance optimization is used with the aim of maximizing returns while minimizing risk, however, more recently, deep reinforcement learning formulations have been explored. Increasingly, investors have demonstrated an interest in incorporating ESG objectives when making investment decisions, and modifications to the classical mean-variance optimization framework have been developed. In this work, we study the use of deep reinforcement learning for responsible portfolio optimization, by incorporating ESG states and objectives, and provide comparisons against modified mean-variance approaches. Our results show that deep reinforcement learning policies can provide competitive performance against mean-variance approaches for responsible portfolio allocation across additive and multiplicative utility functions of financial and ESG responsibility objectives.

Via

Industries frequently adjust their facilities network by opening new branches in promising areas and closing branches in areas where they expect low profits. In this paper, we examine a particular class of facility location problems. Our objective is to minimize the loss of sales resulting from the removal of several retail stores. However, estimating sales accurately is expensive and time-consuming. To overcome this challenge, we leverage Monte Carlo Tree Search (MCTS) assisted by a surrogate model that computes evaluations faster. Results suggest that MCTS supported by a fast surrogate function can generate solutions faster while maintaining a consistent solution compared to MCTS that does not benefit from the surrogate function.

Via

In many real-world scenarios, agents are involved in optimization problems. Since most of these scenarios are over-constrained, optimal solutions do not always satisfy all agents. Some agents might be unhappy and ask questions of the form ``Why does solution $S$ not satisfy property $P$?''. In this paper, we propose MAoE, a domain-independent approach to obtain contrastive explanations by (i) generating a new solution $S^\prime$ where the property $P$ is enforced, while also minimizing the differences between $S$ and $S^\prime$; and (ii) highlighting the differences between the two solutions. Such explanations aim to help agents understanding why the initial solution is better than what they expected. We have carried out a computational evaluation that shows that MAoE can generate contrastive explanations for large multi-agent optimization problems. We have also performed an extensive user study in four different domains that shows that, after being presented with these explanations, humans' satisfaction with the original solution increases.

Via

Stochastic optimization (SO) attempts to offer optimal decisions in the presence of uncertainty. Often, the classical formulation of these problems becomes intractable due to (a) the number of scenarios required to capture the uncertainty and (b) the discrete nature of real-world planning problems. To overcome these tractability issues, practitioners turn to decomposition methods that divide the problem into smaller, more tractable sub-problems. The focal decomposition method of this paper is Benders decomposition (BD), which decomposes stochastic optimization problems on the basis of scenario independence. In this paper we propose a method of accelerating BD with the aid of a surrogate model in place of an NP-hard integer master problem. Through the acceleration method we observe 30% faster average convergence when compared to other accelerated BD implementations. We introduce a reinforcement learning agent as a surrogate and demonstrate how it can be used to solve a stochastic inventory management problem.

Via

Scheduling is the task of assigning a set of scarce resources distributed over time to a set of agents, who typically have preferences about the assignments they would like to get. Due to the constrained nature of these problems, satisfying all agents' preferences is often infeasible, which might lead to some agents not being happy with the resulting schedule. Providing explanations has been shown to increase satisfaction and trust in solutions produced by AI tools. However, it is particularly challenging to explain solutions that are influenced by and impact on multiple agents. In this paper we introduce the EXPRES framework, which can explain why a given preference was unsatisfied in a given optimal schedule. The EXPRES framework consists of: (i) an explanation generator that, based on a Mixed-Integer Linear Programming model, finds the best set of reasons that can explain an unsatisfied preference; and (ii) an explanation parser, which translates the generated explanations into human interpretable ones. Through simulations, we show that the explanation generator can efficiently scale to large instances. Finally, through a set of user studies within J.P. Morgan, we show that employees preferred the explanations generated by EXPRES over human-generated ones when considering workforce scheduling scenarios.

Via

We consider a novel queuing problem where the decision-maker must choose to accept or reject randomly arriving tasks into a no buffer queue which are processed by $N$ identical servers. Each task has a price, which is a positive real number, and a class. Each class of task has a different price distribution and service rate, and arrives according to an inhomogenous Poisson process. The objective is to decide which tasks to accept so that the total price of tasks processed is maximised over a finite horizon. We formulate the problem as a discrete time Markov Decision Process (MDP) with a hybrid state space. We show that the optimal value function has a specific structure, which enables us to solve the hybrid MDP exactly. Moreover, we prove that as the time step is reduced, the discrete time solution approaches the optimal solution to the original continuous time problem. To improve the scalability of our approach to a greater number of task classes, we present an approximation based on state abstraction. We validate our approach on synthetic data, as well as a real financial fraud data set, which is the motivating application for this work.

Via

One of the major limitations for the employment of model-based planning and scheduling in practical applications is the need of costly re-planning when an incongruence between the observed reality and the formal model is encountered during execution. Robustness Envelopes characterize the set of possible contingencies that a plan is able to address without re-planning, but their exact computation is extremely expensive; furthermore, general robustness envelopes are not amenable for efficient execution. In this paper, we present a novel, anytime algorithm to approximate Robustness Envelopes, making them scalable and executable. This is proven by an experimental analysis showing the efficiency of the algorithm, and by a concrete case study where the execution of robustness envelopes significantly reduces the number of re-plannings.

Via