Reinforcement learning (RL) is a powerful tool for solving complex decision-making problems, but its lack of transparency and interpretability has been a major challenge in domains where decisions have significant real-world consequences. In this paper, we propose a novel Advantage Actor-Critic with Reasoner (A2CR), which can be easily applied to Actor-Critic-based RL models and make them interpretable. A2CR consists of three interconnected networks: the Policy Network, the Value Network, and the Reasoner Network. By predefining and classifying the underlying purpose of the actor's actions, A2CR automatically generates a more comprehensive and interpretable paradigm for understanding the agent's decision-making process. It offers a range of functionalities such as purpose-based saliency, early failure detection, and model supervision, thereby promoting responsible and trustworthy RL. Evaluations conducted in action-rich Super Mario Bros environments yield intriguing findings: Reasoner-predicted label proportions decrease for ``Breakout" and increase for ``Hovering" as the exploration level of the RL algorithm intensifies. Additionally, purpose-based saliencies are more focused and comprehensible.
The World Health Organization (WHO) announced that COVID-19 was a pandemic disease on the 11th of March as there were 118K cases in several countries and territories. Numerous researchers worked on forecasting the number of confirmed cases since anticipating the growth of the cases helps governments adopting knotty decisions to ease the lockdowns orders for their countries. These orders help several people who have lost their jobs and support gravely impacted businesses. Our research aims to investigate the relation between Google search trends and the spreading of the novel coronavirus (COVID-19) over countries worldwide, to predict the number of cases. We perform a correlation analysis on the keywords of the related Google search trends according to the number of confirmed cases reported by the WHO. After that, we applied several machine learning techniques (Multiple Linear Regression, Non-negative Integer Regression, Deep Neural Network), to forecast the number of confirmed cases globally based on historical data as well as the hybrid data (Google search trends). Our results show that Google search trends are highly associated with the number of reported confirmed cases, where the Deep Learning approach outperforms other forecasting techniques. We believe that it is not only a promising approach for forecasting the confirmed cases of COVID-19, but also for similar forecasting problems that are associated with the related Google trends.
It is widely acknowledged that addiction relapse is highly associated with spatial-temporal factors such as some specific places or time periods. Current studies suggest that those factors can be utilized for better relapse interventions, however, there is no relapse prevention application that makes use of those factors. In this paper, we introduce a mobile app called "Addict Free", which records user profiles, tracks relapse history and summarizes recovering statistics to help users better understand their recovering situations. Also, this app builds a relapse recovering community, which allows users to ask for advice and encouragement, and share relapse prevention experience. Moreover, machine learning algorithms that ingest spatial and temporal factors are utilized to predict relapse, based on which helpful addiction diversion activities are recommended by a recovering recommendation algorithm. By interacting with users, this app targets at providing smart suggestions that aim to stop relapse, especially for alcohol and tobacco addiction users.
The United States is currently experiencing an unprecedented opioid crisis, and opioid overdose has become a leading cause of injury and death. Effective opioid addiction recovery calls for not only medical treatments, but also behavioral interventions for impacted individuals. In this paper, we study communication and behavior patterns of patients with opioid use disorder (OUD) from social media, intending to demonstrate how existing information from common activities, such as online social networking, might lead to better prediction, evaluation, and ultimately prevention of relapses. Through a multi-disciplinary and advanced novel analytic perspective, we characterize opioid addiction behavior patterns by analyzing opioid groups from Reddit.com - including modeling online discussion topics, analyzing text co-occurrence and correlations, and identifying emotional states of people with OUD. These quantitative analyses are of practical importance and demonstrate innovative ways to use information from online social media, to create technology that can assist in relapse prevention.
A robust model for time series forecasting is highly important in many domains, including but not limited to financial forecast, air temperature and electricity consumption. To improve forecasting performance, traditional approaches usually require additional feature sets. However, adding more feature sets from different sources of data is not always feasible due to its accessibility limitation. In this paper, we propose a novel self-boosted mechanism in which the original time series is decomposed into multiple time series. These time series played the role of additional features in which the closely related time series group is used to feed into multi-task learning model, and the loosely related group is fed into multi-view learning part to utilize its complementary information. We use three real-world datasets to validate our model and show the superiority of our proposed method over existing state-of-the-art baseline methods.
Efficient job scheduling on data centers under heterogeneous complexity is crucial but challenging since it involves the allocation of multi-dimensional resources over time and space. To adapt the complex computing environment in data centers, we proposed an innovative Advantage Actor-Critic (A2C) deep reinforcement learning based approach called DeepScheduler for job scheduling. DeepScheduler consists of two agents, one of which, dubbed the actor, is responsible for learning the scheduling policy automatically and the other one, the critic, reduces the estimation error. Unlike previous policy gradient approaches, DeepScheduler is designed to reduce the gradient estimation variance and to update parameters efficiently. We show that the DeepScheduler can achieve competitive scheduling performance using both simulated workloads and real data collected from an academic data center.
Precisely forecasting wind speed is essential for wind power producers and grid operators. However, this task is challenging due to the stochasticity of wind speed. To accurately predict short-term wind speed under uncertainties, this paper proposed a multi-variable stacked LSTMs model (MSLSTM). The proposed method utilizes multiple historical meteorological variables, such as wind speed, temperature, humidity, pressure, dew point and solar radiation to accurately predict wind speeds. The prediction performance is extensively assessed using real data collected in West Texas, USA. The experimental results show that the proposed MSLSTM can preferably capture and learn uncertainties while output competitive performance.
Understanding and accurately predicting within-field spatial variability of crop yield play a key role in site-specific management of crop inputs such as irrigation water and fertilizer for optimized crop production. However, such a task is challenged by the complex interaction between crop growth and environmental and managerial factors, such as climate, soil conditions, tillage, and irrigation. In this paper, we present a novel Spatial-temporal Multi-Task Learning algorithms for within-field crop yield prediction in west Texas from 2001 to 2003. This algorithm integrates multiple heterogeneous data sources to learn different features simultaneously, and to aggregate spatial-temporal features by introducing a weighted regularizer to the loss functions. Our comprehensive experimental results consistently outperform the results of other conventional methods, and suggest a promising approach, which improves the landscape of crop prediction research fields.
Opioid addiction is a severe public health threat in the U.S, causing massive deaths and many social problems. Accurate relapse prediction is of practical importance for recovering patients since relapse prediction promotes timely relapse preventions that help patients stay clean. In this paper, we introduce a Generative Adversarial Networks (GAN) model to predict the addiction relapses based on sentiment images and social influences. Experimental results on real social media data from Reddit.com demonstrate that the GAN model delivers a better performance than comparable alternative techniques. The sentiment images generated by the model show that relapse is closely connected with two emotions `joy' and `negative'. This work is one of the first attempts to predict relapses using massive social media data and generative adversarial nets. The proposed method, combined with knowledge of social media mining, has the potential to revolutionize the practice of opioid addiction prevention and treatment.
A crucial and time-sensitive task when any disaster occurs is to rescue victims and distribute resources to the right groups and locations. This task is challenging in populated urban areas, due to the huge burst of help requests generated in a very short period. To improve the efficiency of the emergency response in the immediate aftermath of a disaster, we propose a heuristic multi-agent reinforcement learning scheduling algorithm, named as ResQ, which can effectively schedule the rapid deployment of volunteers to rescue victims in dynamic settings. The core concept is to quickly identify victims and volunteers from social network data and then schedule rescue parties with an adaptive learning algorithm. This framework performs two key functions: 1) identify trapped victims and rescue volunteers, and 2) optimize the volunteers' rescue strategy in a complex time-sensitive environment. The proposed ResQ algorithm can speed up the training processes through a heuristic function which reduces the state-action space by identifying the set of particular actions over others. Experimental results showed that the proposed heuristic multi-agent reinforcement learning based scheduling outperforms several state-of-art methods, in terms of both reward rate and response times.