Hyper-parameter tuning (HPT) is crucial for many machine learning (ML) algorithms. But due to the large searching space, HPT is usually time-consuming and resource-intensive. Nowadays, many researchers use public cloud resources to train machine learning models, convenient yet expensive. How to speed up the HPT process while at the same time reduce cost is very important for cloud ML users. In this paper, we propose SpotTune, an approach that exploits transient revocable resources in the public cloud with some tailored strategies to do HPT in a parallel and cost-efficient manner. Orchestrating the HPT process upon transient servers, SpotTune uses two main techniques, fine-grained cost-aware resource provisioning, and ML training trend predicting, to reduce the monetary cost and runtime of HPT processes. Our evaluations show that SpotTune can reduce the cost by up to 90% and achieve a 16.61x performance-cost rate improvement.
We propose using machine learning models for the direct synthesis of on-chip electromagnetic (EM) passive structures to enable rapid or even automated designs and optimizations of RF/mm-Wave circuits. As a proof of concept, we demonstrate the direct synthesis of a 1:1 transformer on a 45nm SOI process using our proposed neural network model. Using pre-existing transformer s-parameter files and their geometric design training samples, the model predicts target geometric designs.
Adversarial examples are firstly investigated in the area of computer vision: by adding some carefully designed ''noise'' to the original input image, the perturbed image that cannot be distinguished from the original one by human, can fool a well-trained classifier easily. In recent years, researchers also demonstrated that adversarial examples can mislead deep reinforcement learning (DRL) agents on playing video games using image inputs with similar methods. However, although DRL has been more and more popular in the area of intelligent transportation systems, there is little research investigating the impacts of adversarial attacks on them, especially for algorithms that do not take images as inputs. In this work, we investigated several fast methods to generate adversarial examples to significantly degrade the performance of a well-trained DRL- based energy management system of an extended range electric delivery vehicle. The perturbed inputs are low-dimensional state representations and close to the original inputs quantified by different kinds of norms. Our work shows that, to apply DRL agents on real-world transportation systems, adversarial examples in the form of cyber-attack should be considered carefully, especially for applications that may lead to serious safety issues.
Increasing the fuel economy of hybrid electric vehicles (HEVs) and extended range electric vehicles (EREVs) through optimization-based energy management strategies (EMS) has been an active research area in transportation. However, it is difficult to apply optimization-based EMS to current in-use EREVs because insufficient knowledge is known about future trips, and because such methods are computationally expensive for large-scale deployment. As a result, most past research has been validated on standard driving cycles or on recorded high-resolution data from past real driving cycles. This paper improves an in-use rule-based EMS that is used in a delivery vehicle fleet equipped with two-way vehicle-to-cloud connectivity. A physics model-guided online Bayesian framework is described and validated on large number of in-use driving samples of EREVs used for last-mile package delivery. The framework includes: a database, a preprocessing module, a vehicle model and an online Bayesian algorithm module. It uses historical 0.2 Hz resolution trip data as input and outputs an updated parameter to the engine control logic on the vehicle to reduce fuel consumption on the next trip. The key contribution of this work is a framework that provides an immediate solution for fuel use reduction of in-use EREVs. The framework was also demonstrated on real-world EREVs delivery vehicles operating on actual routes. The results show an average of 12.8% fuel use reduction among tested vehicles for 155 real delivery trips. The presented framework is extendable to other EREV applications including passenger vehicles, transit buses, and other vocational vehicles whose trips are similar day-to-day.
Temporal modeling is key for action recognition in videos. It normally considers both short-range motions and long-range aggregations. In this paper, we propose a Temporal Excitation and Aggregation (TEA) block, including a motion excitation (ME) module and a multiple temporal aggregation (MTA) module, specifically designed to capture both short- and long-range temporal evolution. In particular, for short-range motion modeling, the ME module calculates the feature-level temporal differences from spatiotemporal features. It then utilizes the differences to excite the motion-sensitive channels of the features. The long-range temporal aggregations in previous works are typically achieved by stacking a large number of local temporal convolutions. Each convolution processes a local temporal window at a time. In contrast, the MTA module proposes to deform the local convolution to a group of sub-convolutions, forming a hierarchical residual architecture. Without introducing additional parameters, the features will be processed with a series of sub-convolutions, and each frame could complete multiple temporal aggregations with neighborhoods. The final equivalent receptive field of temporal dimension is accordingly enlarged, which is capable of modeling the long-range temporal relationship over distant frames. The two components of the TEA block are complementary in temporal modeling. Finally, our approach achieves impressive results at low FLOPs on several action recognition benchmarks, such as Kinetics, Something-Something, HMDB51, and UCF101, which confirms its effectiveness and efficiency.
Deep neural networks have been widely adopted in modern reinforcement learning (RL) algorithms with great empirical successes in various domains. However, the large search space of training a neural network requires a significant amount of data, which makes the current RL algorithms not sample efficient. Motivated by the fact that many environments with continuous state space have smooth transitions, we propose to learn a smooth policy that behaves smoothly with respect to states. In contrast to policies parameterized by linear/reproducing kernel functions, where simple regularization techniques suffice to control smoothness, for neural network based reinforcement learning algorithms, there is no readily available solution to learn a smooth policy. In this paper, we develop a new training framework --- $\textbf{S}$mooth $\textbf{R}$egularized $\textbf{R}$einforcement $\textbf{L}$earning ($\textbf{SR}^2\textbf{L}$), where the policy is trained with smoothness-inducing regularization. Such regularization effectively constrains the search space of the learning algorithms and enforces smoothness in the learned policy. We apply the proposed framework to both on-policy (TRPO) and off-policy algorithm (DDPG). Through extensive experiments, we demonstrate that our method achieves improved sample efficiency.
Researchers often have to deal with heterogeneous population with mixed regression relationships, increasingly so in the era of data explosion. In such problems, when there are many candidate predictors, it is not only of interest to identify the predictors that are associated with the outcome, but also to distinguish the true sources of heterogeneity, i.e., to identify the predictors that have different effects among the clusters and thus are the true contributors to the formation of the clusters. We clarify the concepts of the source of heterogeneity that account for potential scale differences of the clusters and propose a regularized finite mixture effects regression to achieve heterogeneity pursuit and feature selection simultaneously. As the name suggests, the problem is formulated under an effects-model parameterization, in which the cluster labels are missing and the effect of each predictor on the outcome is decomposed to a common effect term and a set of cluster-specific terms. A constrained sparse estimation of these effects leads to the identification of both the variables with common effects and those with heterogeneous effects. We propose an efficient algorithm and show that our approach can achieve both estimation and selection consistency. Simulation studies further demonstrate the effectiveness of our method under various practical scenarios. Three applications are presented, namely, an imaging genetics study for linking genetic factors and brain neuroimaging traits in Alzheimer's disease, a public health study for exploring the association between suicide risk among adolescents and their school district characteristics, and a sport analytics study for understanding how the salary levels of baseball players are associated with their performance and contractual status.
Graph Neural Network (GNN) is a powerful model to learn representations and make predictions on graph data. Existing efforts on GNN have largely defined the graph convolution as a weighted sum of the features of the connected nodes to form the representation of the target node. Nevertheless, the operation of weighted sum assumes the neighbor nodes are independent of each other, and ignores the possible interactions between them. When such interactions exist, such as the co-occurrence of two neighbor nodes is a strong signal of the target node's characteristics, existing GNN models may fail to capture the signal. In this work, we argue the importance of modeling the interactions between neighbor nodes in GNN. We propose a new graph convolution operator, which augments the weighted sum with pairwise interactions of the representations of neighbor nodes. We term this framework as Bilinear Graph Neural Network (BGNN), which improves GNN representation ability with bilinear interactions between neighbor nodes. In particular, we specify two BGNN models named BGCN and BGAT, based on the well-known GCN and GAT, respectively. Empirical results on three public benchmarks of semi-supervised node classification verify the effectiveness of BGNN --- BGCN (BGAT) outperforms GCN (GAT) by 1.6% (1.5%) in classification accuracy.
As the first diagnostic imaging modality of avascular necrosis of the femoral head (AVNFH), accurately staging AVNFH from a plain radiograph is critical and challenging for orthopedists. Thus, we propose a deep learning-based AVNFH diagnosis system (AVN-net). The proposed AVN-net reads plain radiographs of the pelvis, conducts diagnosis, and visualizes results automatically. Deep convolutional neural networks are trained to provide an end-to-end diagnosis solution, covering femoral head detection, exam-view/sides identification, AVNFH diagnosis, and key clinical note generation subtasks. AVN-net is able to obtain state-of-the-art testing AUC of 0.95 (95% CI: 0.92-0.98) in AVNFH detection and significantly greater F1 scores (p<0.01) than less-to-moderately experienced orthopedists in all diagnostic tests. Furthermore, two real-world pilot studies were conducted for diagnosis support and education assistance, respectively, to assess the utility of AVN-net. The experimental results are promising. With the AVN-net diagnosis as a reference, the diagnostic accuracy and consistency of all orthopedists considerably improved while requiring only 1/4 of the time. Students self-studying the AVNFH diagnosis using AVN-net can learn better and faster than the control group. To the best of our knowledge, this study is the first research on the prospective use of a deep learning-based diagnosis system for AVNFH by conducting two pilot studies representing real-world application scenarios. We have demonstrated that the proposed AVN-net achieves expert-level AVNFH diagnosis performance, provides efficient support in clinical decision-making, and effectively passes clinical experience to students.