Alert button
Picture for Tim Baarslag

Tim Baarslag

Alert button

Automated Peer-to-peer Negotiation for Energy Contract Settlements in Residential Cooperatives

Nov 26, 2019
Shantanu Chakraborty, Tim Baarslag, Michael Kaisers

Figure 1 for Automated Peer-to-peer Negotiation for Energy Contract Settlements in Residential Cooperatives
Figure 2 for Automated Peer-to-peer Negotiation for Energy Contract Settlements in Residential Cooperatives
Figure 3 for Automated Peer-to-peer Negotiation for Energy Contract Settlements in Residential Cooperatives
Figure 4 for Automated Peer-to-peer Negotiation for Energy Contract Settlements in Residential Cooperatives

This paper presents an automated peer-to-peer negotiation strategy for settling energy contracts among prosumers in a Residential Energy Cooperative considering heterogeneity prosumer preferences. The heterogeneity arises from prosumers' evaluation of energy contracts through multiple societal and environmental criteria and the prosumers' private preferences over those criteria. The prosumers engage in bilateral negotiations with peers to mutually agree on periodical energy contracts/loans consisting of the energy volume to be exchanged at that period and the return time of the exchanged energy. The negotiating prosumers navigate through a common negotiation domain consisting of potential energy contracts and evaluate those contracts from their valuations on the entailed criteria against a utility function that is robust against generation and demand uncertainty. From the repeated interactions, a prosumer gradually learns about the compatibility of its peers in reaching energy contracts that are closer to Nash solutions. Empirical evaluation on real demand, generation and storage profiles -- in multiple system scales -- illustrates that the proposed negotiation based strategy can increase the system efficiency (measured by utilitarian social welfare) and fairness (measured by Nash social welfare) over a baseline strategy and an individual flexibility control strategy representing the status quo strategy. We thus elicit system benefits from peer-to-peer flexibility exchange already without any central coordination and market operator, providing a simple yet flexible and effective paradigm that complements existing markets.

* arXiv admin note: substantial text overlap with arXiv:1807.10978 
Viaarxiv icon

An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems

May 13, 2018
Hongyao Tang, Li Wang, Zan Wang, Tim Baarslag, Jianye Hao

Figure 1 for An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems
Figure 2 for An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems
Figure 3 for An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems
Figure 4 for An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems

Multiagent coordination in cooperative multiagent systems (MASs) has been widely studied in both fixed-agent repeated interaction setting and the static social learning framework. However, two aspects of dynamics in real-world multiagent scenarios are currently missing in existing works. First, the network topologies can be dynamic where agents may change their connections through rewiring during the course of interactions. Second, the game matrix between each pair of agents may not be static and usually not known as a prior. Both the network dynamic and game uncertainty increase the coordination difficulty among agents. In this paper, we consider a multiagent dynamic social learning environment in which each agent can choose to rewire potential partners and interact with randomly chosen neighbors in each round. We propose an optimal rewiring strategy for agents to select most beneficial peers to interact with for the purpose of maximizing the accumulated payoff in repeated interactions. We empirically demonstrate the effectiveness and robustness of our approach through comparing with benchmark strategies. The performance of three representative learning strategies under our social learning framework with our optimal rewiring is investigated as well.

* 9 pages, 5 figures 
Viaarxiv icon

A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

Jul 28, 2017
Pablo Hernandez-Leal, Michael Kaisers, Tim Baarslag, Enrique Munoz de Cote

Figure 1 for A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity
Figure 2 for A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity
Figure 3 for A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity
Figure 4 for A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research.

* 64 pages, 7 figures. Under review since November 2016 
Viaarxiv icon

Can we reach Pareto optimal outcomes using bottom-up approaches?

Jul 03, 2016
Victor Sanchez-Anguix, Reyhan Aydogan, Tim Baarslag, Catholijn M. Jonker

Figure 1 for Can we reach Pareto optimal outcomes using bottom-up approaches?
Figure 2 for Can we reach Pareto optimal outcomes using bottom-up approaches?
Figure 3 for Can we reach Pareto optimal outcomes using bottom-up approaches?

Traditionally, researchers in decision making have focused on attempting to reach Pareto Optimality using horizontal approaches, where optimality is calculated taking into account every participant at the same time. Sometimes, this may prove to be a difficult task (e.g., conflict, mistrust, no information sharing, etc.). In this paper, we explore the possibility of achieving Pareto Optimal outcomes in a group by using a bottom-up approach: discovering Pareto optimal outcomes by interacting in subgroups. We analytically show that Pareto optimal outcomes in a subgroup are also Pareto optimal in a supergroup of those agents in the case of strict, transitive, and complete preferences. Then, we empirically analyze the prospective usability and practicality of bottom-up approaches in a variety of decision making domains.

* 2nd Workshop on Conflict Resolution in Decision Making (COREDEMA@ECAI2016) 
Viaarxiv icon

Optimistic Risk Perception in the Temporal Difference error Explains the Relation between Risk-taking, Gambling, Sensation-seeking and Low Fear

Feb 03, 2015
Joost Broekens, Tim Baarslag

Figure 1 for Optimistic Risk Perception in the Temporal Difference error Explains the Relation between Risk-taking, Gambling, Sensation-seeking and Low Fear
Figure 2 for Optimistic Risk Perception in the Temporal Difference error Explains the Relation between Risk-taking, Gambling, Sensation-seeking and Low Fear
Figure 3 for Optimistic Risk Perception in the Temporal Difference error Explains the Relation between Risk-taking, Gambling, Sensation-seeking and Low Fear
Figure 4 for Optimistic Risk Perception in the Temporal Difference error Explains the Relation between Risk-taking, Gambling, Sensation-seeking and Low Fear

Understanding the affective, cognitive and behavioural processes involved in risk taking is essential for treatment and for setting environmental conditions to limit damage. Using Temporal Difference Reinforcement Learning (TDRL) we computationally investigated the effect of optimism in risk perception in a variety of goal-oriented tasks. Optimism in risk perception was studied by varying the calculation of the Temporal Difference error, i.e., delta, in three ways: realistic (stochastically correct), optimistic (assuming action control), and overly optimistic (assuming outcome control). We show that for the gambling task individuals with 'healthy' perception of control, i.e., action optimism, do not develop gambling behaviour while individuals with 'unhealthy' perception of control, i.e., outcome optimism, do. We show that high intensity of sensations and low levels of fear co-occur due to optimistic risk perception. We found that overly optimistic risk perception (outcome optimism) results in risk taking and in persistent gambling behaviour in addition to high intensity of sensations. We discuss how our results replicate risk-taking related phenomena.

Viaarxiv icon