Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Time": models, code, and papers

Collaborative Intelligent Reflecting Surface Networks with Multi-Agent Reinforcement Learning

Mar 26, 2022
Jie Zhang, Jun Li, Yijin Zhang, Qingqing Wu, Xiongwei Wu, Feng Shu, Shi Jin, Wen Chen

Figure 1 for Collaborative Intelligent Reflecting Surface Networks with Multi-Agent Reinforcement Learning

Figure 2 for Collaborative Intelligent Reflecting Surface Networks with Multi-Agent Reinforcement Learning

Figure 3 for Collaborative Intelligent Reflecting Surface Networks with Multi-Agent Reinforcement Learning

Figure 4 for Collaborative Intelligent Reflecting Surface Networks with Multi-Agent Reinforcement Learning

Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks. In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting. Aiming to maximize the long-term average achievable system rate, an optimization problem is formulated by jointly designing the transmit beamforming at the base station (BS) and discrete phase shift beamforming at the IRSs, with the constraints on transmit power, user data rate requirement and IRS energy buffer size. Considering time-varying channels and stochastic arrivals of energy harvested by the IRSs, we first formulate the problem as a Markov decision process (MDP) and then develop a novel multi-agent Q-mix (MAQ) framework with two layers to decouple the optimization parameters. The higher layer is for optimizing phase shift resolutions, and the lower one is for phase shift beamforming and power allocation. Since the phase shift optimization is an integer programming problem with a large-scale action space, we improve MAQ by incorporating the Wolpertinger method, namely, MAQ-WP algorithm to achieve a sub-optimality with reduced dimensions of action space. In addition, as MAQ-WP is still of high complexity to achieve good performance, we propose a policy gradient-based MAQ algorithm, namely, MAQ-PG, by mapping the discrete phase shift actions into a continuous space at the cost of a slight performance loss. Simulation results demonstrate that the proposed MAQ-WP and MAQ-PG algorithms can converge faster and achieve data rate improvements of 10.7% and 8.8% over the conventional multi-agent DDPG, respectively.

* 14 pages, 11 figures

Via

Access Paper or Ask Questions

Reinforcement Learning with Dynamic Convex Risk Measures

Dec 26, 2021
Anthony Coache, Sebastian Jaimungal

Figure 1 for Reinforcement Learning with Dynamic Convex Risk Measures

Figure 2 for Reinforcement Learning with Dynamic Convex Risk Measures

Figure 3 for Reinforcement Learning with Dynamic Convex Risk Measures

Figure 4 for Reinforcement Learning with Dynamic Convex Risk Measures

We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to optimization problems in statistical arbitrage trading and obstacle avoidance robot control.

* 19 pages, 7 figures

Via

Access Paper or Ask Questions

Closed-Form Second-Order Partial Derivatives of Rigid-Body Inverse Dynamics

Mar 03, 2022
Shubham Singh, Ryan P. Russell, Patrick M. Wensing

Figure 1 for Closed-Form Second-Order Partial Derivatives of Rigid-Body Inverse Dynamics

Figure 2 for Closed-Form Second-Order Partial Derivatives of Rigid-Body Inverse Dynamics

Figure 3 for Closed-Form Second-Order Partial Derivatives of Rigid-Body Inverse Dynamics

Figure 4 for Closed-Form Second-Order Partial Derivatives of Rigid-Body Inverse Dynamics

Optimization-based control methods for robots often rely on first-order dynamics approximation methods like in iLQR. Using second-order approximations of the dynamics is expensive due to the costly second-order partial derivatives of dynamics with respect to the state and control. Current approaches for calculating these derivatives typically use automatic differentiation (AD) and chain-rule accumulation or finite-difference. In this paper, for the first time, we present closed-form analytical second-order partial derivatives of inverse dynamics for rigid-body systems with floating base and multi-DoF joints. A new extension of spatial vector algebra is proposed that enables the analysis. A recursive $\mathcal{O}(Nd^2)$ algorithm is also provided where $N$ is the number of bodies and $d$ is the depth of the kinematic tree. A comparison with AD in CasADi shows speedups of 1.5-3$\times$ for serial kinematic trees with $N> 5$, and a C++ implementation shows runtimes of $\approx$400$\mu s$ for a quadruped.

* submitted to IROS 2022

Via

Access Paper or Ask Questions

Drawing Inductor Layout with a Reinforcement Learning Agent: Method and Application for VCO Inductors

Feb 25, 2022
Cameron Haigh, Zichen Zhang, Negar Hassanpour, Khurram Javed, Yingying Fu, Shayan Shahramian, Shawn Zhang, Jun Luo

Figure 1 for Drawing Inductor Layout with a Reinforcement Learning Agent: Method and Application for VCO Inductors

Figure 2 for Drawing Inductor Layout with a Reinforcement Learning Agent: Method and Application for VCO Inductors

Figure 3 for Drawing Inductor Layout with a Reinforcement Learning Agent: Method and Application for VCO Inductors

Figure 4 for Drawing Inductor Layout with a Reinforcement Learning Agent: Method and Application for VCO Inductors

Design of Voltage-Controlled Oscillator (VCO) inductors is a laborious and time-consuming task that is conventionally done manually by human experts. In this paper, we propose a framework for automating the design of VCO inductors, using Reinforcement Learning (RL). We formulate the problem as a sequential procedure, where wire segments are drawn one after another, until a complete inductor is created. We then employ an RL agent to learn to draw inductors that meet certain target specifications. In light of the need to tweak the target specifications throughout the circuit design cycle, we also develop a variant in which the agent can learn to quickly adapt to draw new inductors for moderately different target specifications. Our empirical results show that the proposed framework is successful at automatically generating VCO inductors that meet or exceed the target specification.

Via

Access Paper or Ask Questions

Virtual Histological Staining of Label-Free Total Absorption Photoacoustic Remote Sensing (TA-PARS)

Apr 01, 2022
Marian Boktor, Benjamin Ecclestone, Vlad Pekar, Deepak Dinakaran, John R. Mackey, Paul Fieguth, Parsin Haji Reza

Figure 1 for Virtual Histological Staining of Label-Free Total Absorption Photoacoustic Remote Sensing (TA-PARS)

Figure 2 for Virtual Histological Staining of Label-Free Total Absorption Photoacoustic Remote Sensing (TA-PARS)

Figure 3 for Virtual Histological Staining of Label-Free Total Absorption Photoacoustic Remote Sensing (TA-PARS)

Figure 4 for Virtual Histological Staining of Label-Free Total Absorption Photoacoustic Remote Sensing (TA-PARS)

Histopathological visualizations are a pillar of modern medicine and biological research. Surgical oncology relies exclusively on post-operative histology to determine definitive surgical success and guide adjuvant treatments. The current histology workflow is based on bright-field microscopic assessment of histochemical stained tissues and has some major limitations. For example, the preparation of stained specimens for brightfield assessment requires lengthy sample processing, delaying interventions for days or even weeks. Hence, there is a pressing need for improved histopathology methods. In this paper, we present a deep-learning-based approach for virtual label-free histochemical staining of total-absorption photoacoustic remote sensing (TA-PARS) images of unstained tissue. TA-PARS provides an array of directly measured label-free contrasts such as scattering and total absorption (radiative and non-radiative), ideal for developing H&E colorizations without the need to infer arbitrary tissue structures. We use a Pix2Pix generative adversarial network (GAN) to develop visualizations analogous to H&E staining from label-free TA-PARS images. Thin sections of human skin tissue were first virtually stained with the TA-PARS, then were chemically stained with H&E producing a one-to-one comparison between the virtual and chemical staining. The one-to-one matched virtually- and chemically- stained images exhibit high concordance validating the digital colorization of the TA-PARS images against the gold standard H&E. TA-PARS images were reviewed by four dermatologic pathologists who confirmed they are of diagnostic quality, and that resolution, contrast, and color permitted interpretation as if they were H&E. The presented approach paves the way for the development of TA-PARS slide-free histology, which promises to dramatically reduce the time from specimen resection to histological imaging.

* 16 pages, 8 figures

Via

Access Paper or Ask Questions

Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents

Mar 16, 2022
Jian Zhao, Youpeng Zhao, Weixun Wang, Mingyu Yang, Xunhan Hu, Wengang Zhou, Jianye Hao, Houqiang Li

Figure 1 for Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents

Figure 2 for Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents

Figure 3 for Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents

Figure 4 for Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents

Multi-agent reinforcement learning is difficult to be applied in practice, which is partially due to the gap between the simulated and real-world scenarios. One reason for the gap is that the simulated systems always assume that the agents can work normally all the time, while in practice, one or more agents may unexpectedly "crash" during the coordination process due to inevitable hardware or software failures. Such crashes will destroy the cooperation among agents, leading to performance degradation. In this work, we present a formal formulation of a cooperative multi-agent reinforcement learning system with unexpected crashes. To enhance the robustness of the system to crashes, we propose a coach-assisted multi-agent reinforcement learning framework, which introduces a virtual coach agent to adjust the crash rate during training. We design three coaching strategies and the re-sampling strategy for our coach agent. To the best of our knowledge, this work is the first to study the unexpected crashes in the multi-agent system. Extensive experiments on grid-world and StarCraft II micromanagement tasks demonstrate the efficacy of adaptive strategy compared with the fixed crash rate strategy and curriculum learning strategy. The ablation study further illustrates the effectiveness of our re-sampling strategy.

Via

Access Paper or Ask Questions

Deep Residual Error and Bag-of-Tricks Learning for Gravitational Wave Surrogate Modeling

Mar 16, 2022
Styliani-Christina Fragkouli, Paraskevi Nousi, Nikolaos Passalis, Panagiotis Iosif, Nikolaos Stergioulas, Anastasios Tefas

Figure 1 for Deep Residual Error and Bag-of-Tricks Learning for Gravitational Wave Surrogate Modeling

Figure 2 for Deep Residual Error and Bag-of-Tricks Learning for Gravitational Wave Surrogate Modeling

Figure 3 for Deep Residual Error and Bag-of-Tricks Learning for Gravitational Wave Surrogate Modeling

Figure 4 for Deep Residual Error and Bag-of-Tricks Learning for Gravitational Wave Surrogate Modeling

Deep learning methods have been employed in gravitational-wave astronomy to accelerate the construction of surrogate waveforms for the inspiral of spin-aligned black hole binaries, among other applications. We demonstrate, that the residual error of an artificial neural network that models the coefficients of the surrogate waveform expansion (especially those of the phase of the waveform) has sufficient structure to be learnable by a second network. Adding this second network, we were able to reduce the maximum mismatch for waveforms in a validation set by more than an order of magnitude. We also explored several other ideas for improving the accuracy of the surrogate model, such as the exploitation of similarities between waveforms, the augmentation of the training set, the dissection of the input space, using dedicated networks per output coefficient and output augmentation. In several cases, small improvements can be observed, but the most significant improvement still comes from the addition of a second network that models the residual error. Since the residual error for more general surrogate waveform models (when e.g. eccentricity is included) may also have a specific structure, one can expect our method to be applicable to cases where the gain in accuracy could lead to significant gains in computational time.

Via

Access Paper or Ask Questions

Successful Recovery of an Observed Meteorite Fall Using Drones and Machine Learning

Mar 03, 2022
Seamus L. Anderson, Martin C. Towner, John Fairweather, Philip A. Bland, Hadrien A. R. Devillepoix, Eleanor K. Sansom, Martin Cupak, Patrick M. Shober, Gretchen K. Benedix

Figure 1 for Successful Recovery of an Observed Meteorite Fall Using Drones and Machine Learning

Figure 2 for Successful Recovery of an Observed Meteorite Fall Using Drones and Machine Learning

Figure 3 for Successful Recovery of an Observed Meteorite Fall Using Drones and Machine Learning

Figure 4 for Successful Recovery of an Observed Meteorite Fall Using Drones and Machine Learning

We report the first-time recovery of a fresh meteorite fall using a drone and a machine learning algorithm. A fireball on the 1st April 2021 was observed over Western Australia by the Desert Fireball Network, for which a fall area was calculated for the predicted surviving mass. A search team arrived on site and surveyed 5.1 km2 area over a 4-day period. A convolutional neural network, trained on previously-recovered meteorites with fusion crusts, processed the images on our field computer after each flight. meteorite candidates identified by the algorithm were sorted by team members using two user interfaces to eliminate false positives. Surviving candidates were revisited with a smaller drone, and imaged in higher resolution, before being eliminated or finally being visited in-person. The 70 g meteorite was recovered within 50 m of the calculated fall line using, demonstrating the effectiveness of this methodology which will facilitate the efficient collection of many more observed meteorite falls.

* 4 Figures, 1 Table, 10 pages

Via

Access Paper or Ask Questions

A Survey of Real-Time Social-Based Traffic Detection

Jul 07, 2020
Hashim Abu-gellban

Figure 1 for A Survey of Real-Time Social-Based Traffic Detection

Figure 2 for A Survey of Real-Time Social-Based Traffic Detection

Figure 3 for A Survey of Real-Time Social-Based Traffic Detection

Figure 4 for A Survey of Real-Time Social-Based Traffic Detection

Online traffic news web sites do not always announce traffic events in areas in real-time. There is a capability to employ text mining and machine learning techniques on the twitter stream to perform event detection, in order to develop a real-time traffic detection system. In this present survey paper, we will deliberate the current state-of-art techniques in detecting traffic events in real-time focusing on five papers [1, 2, 3, 4, 5]. Lastly, applying text mining techniques and SVM classifiers in paper [2] gave the best results (i.e. 95.75% accuracy and 95.8% F1-score).

Via

Access Paper or Ask Questions

Self-supervised HDR Imaging from Motion and Exposure Cues

Mar 23, 2022
Michal Nazarczuk, Sibi Catley-Chandar, Ales Leonardis, Eduardo Pérez Pellitero

Figure 1 for Self-supervised HDR Imaging from Motion and Exposure Cues

Figure 2 for Self-supervised HDR Imaging from Motion and Exposure Cues

Figure 3 for Self-supervised HDR Imaging from Motion and Exposure Cues

Figure 4 for Self-supervised HDR Imaging from Motion and Exposure Cues

Recent High Dynamic Range (HDR) techniques extend the capabilities of current cameras where scenes with a wide range of illumination can not be accurately captured with a single low-dynamic-range (LDR) image. This is generally accomplished by capturing several LDR images with varying exposure values whose information is then incorporated into a merged HDR image. While such approaches work well for static scenes, dynamic scenes pose several challenges, mostly related to the difficulty of finding reliable pixel correspondences. Data-driven approaches tackle the problem by learning an end-to-end mapping with paired LDR-HDR training data, but in practice generating such HDR ground-truth labels for dynamic scenes is time-consuming and requires complex procedures that assume control of certain dynamic elements of the scene (e.g. actor pose) and repeatable lighting conditions (stop-motion capturing). In this work, we propose a novel self-supervised approach for learnable HDR estimation that alleviates the need for HDR ground-truth labels. We propose to leverage the internal statistics of LDR images to create HDR pseudo-labels. We separately exploit static and well-exposed parts of the input images, which in conjunction with synthetic illumination clipping and motion augmentation provide high quality training examples. Experimental results show that the HDR models trained using our proposed self-supervision approach achieve performance competitive with those trained under full supervision, and are to a large extent superior to previous methods that equally do not require any supervision.

Via

Access Paper or Ask Questions