Centralized Training for Decentralized Execution, where agents are trained offline using centralized information but execute in a decentralized manner online, has gained popularity in the multi-agent reinforcement learning community. In particular, actor-critic methods with a centralized critic and decentralized actors are a common instance of this idea. However, the implications of using a centralized critic in this context are not fully discussed and understood even though it is the standard choice of many algorithms. We therefore formally analyze centralized and decentralized critic approaches, providing a deeper understanding of the implications of critic choice. Because our theory makes unrealistic assumptions, we also empirically compare the centralized and decentralized critic methods over a wide set of environments to validate our theories and to provide practical advice. We show that there exist misconceptions regarding centralized critics in the current literature and show that the centralized critic design is not strictly beneficial, but rather both centralized and decentralized critics have different pros and cons that should be taken into account by algorithm designers.
Based on hierarchical partitions, we provide the construction of Haar-type tight framelets on any compact set $K\subseteq \mathbb{R}^d$. In particular, on the unit block $[0,1]^d$, such tight framelets can be built to be with adaptivity and directionality. We show that the adaptive directional Haar tight framelet systems can be used for digraph signal representations. Some examples are provided to illustrate results in this paper.
In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.
In many real-world multi-robot tasks, high-quality solutions often require a team of robots to perform asynchronous actions under decentralized control. Multi-agent reinforcement learning methods have difficulty learning decentralized policies because the environment appearing to be non-stationary due to other agents also learning at the same time. In this paper, we address this challenge by proposing a macro-action-based decentralized multi-agent double deep recurrent Q-net (MacDec-MADDRQN) which creates a new double Q-updating rule to train each decentralized Q-net using a centralized Q-net for action selection. A generalized version of MacDec-MADDRQN with two separate training environments, called Parallel-MacDec-MADDRQN, is also presented to cope with the uncertainty in adopting either centralized or decentralized exploration. The advantages and the practical nature of our methods are demonstrated by achieving near-centralized results in simulation experiments and permitting real robots to accomplish a warehouse tool delivery task in an efficient way.
Traditional methods for achieving high localization accuracy on tactile sensors usually involve a matrix of miniaturized individual sensors distributed on the area of interest. This approach usually comes at a price of increased complexity in fabrication and circuitry, and can be hard to adapt to non-planar geometries. We propose a method where sensing terminals are embedded in a volume of soft material. Mechanical strain in this material results in a measurable signal between any two given terminals. By having multiple terminals and pairing them against each other in all possible combinations, we obtain a rich signal set using few wires. We mine this data to learn the mapping between the signals we extract and the contact parameters of interest. Our approach is general enough that it can be applied with different transduction methods, and achieves high accuracy in identifying indentation location and depth. Moreover, this method lends itself to simple fabrication techniques and makes no assumption about the underlying geometry, potentially simplifying future integration in robot hands.
Achieving high spatial resolution in contact sensing for robotic manipulation often comes at the price of increased complexity in fabrication and integration. One traditional approach is to fabricate a large number of taxels, each delivering an individual, isolated response to a stimulus. In contrast, we propose a method where the sensor simply consists of a continuous volume of piezoresistive elastomer with a number of electrodes embedded inside. We measure piezoresistive effects between all pairs of electrodes in the set, and count on this rich signal set containing the information needed to pinpoint contact location with high accuracy using regression algorithms. In our validation experiments, we demonstrate submillimeter median accuracy in locating contact on a 10mm by 16mm sensor using only four electrodes (creating six unique pairs). In addition to extracting more information from fewer wires, this approach lends itself to simple fabrication methods and makes no assumptions about the underlying geometry, simplifying future integration on robot fingers.
Fully wearable hand rehabilitation and assistive devices could extend training and improve quality of life for patients affected by hand impairments. However, such devices must deliver meaningful manipulation capabilities in a small and lightweight package. In this context, this paper investigates the capability of single-actuator devices to assist whole-hand movement patterns through a network of exotendons. Our prototypes combine a single linear actuator (mounted on a forearm splint) with a network of exotendons (routed on the surface of a soft glove). We investigated two possible tendon network configurations: one that produces full finger extension (overcoming flexor spasticity), and one that combines proximal flexion with distal extension at each finger. In experiments with stroke survivors, we measured the force levels needed to overcome various levels of spasticity and open the hand for grasping using the first of these configurations, and qualitatively demonstrated the ability to execute fingertip grasps using the second. Our results support the feasibility of developing future wearable devices able to assist a range of manipulation tasks.
A key challenge in multi-robot and multi-agent systems is generating solutions that are robust to other self-interested or even adversarial parties who actively try to prevent the agents from achieving their goals. The practicality of existing works addressing this challenge is limited to only small-scale synchronous decision-making scenarios or a single agent planning its best response against a single adversary with fixed, procedurally characterized strategies. In contrast this paper considers a more realistic class of problems where a team of asynchronous agents with limited observation and communication capabilities need to compete against multiple strategic adversaries with changing strategies. This problem necessitates agents that can coordinate to detect changes in adversary strategies and plan the best response accordingly. Our approach first optimizes a set of stratagems that represent these best responses. These optimized stratagems are then integrated into a unified policy that can detect and respond when the adversaries change their strategies. The near-optimality of the proposed framework is established theoretically as well as demonstrated empirically in simulation and hardware.