GPUs are widely used to accelerate the training of machine learning workloads. As modern machine learning models become increasingly larger, they require a longer time to train, leading to higher GPU energy consumption. This paper presents GPOEO, an online GPU energy optimization framework for machine learning training workloads. GPOEO dynamically determines the optimal energy configuration by employing novel techniques for online measurement, multi-objective prediction modeling, and search optimization. To characterize the target workload behavior, GPOEO utilizes GPU performance counters. To reduce the performance counter profiling overhead, it uses an analytical model to detect the training iteration change and only collects performance counter data when an iteration shift is detected. GPOEO employs multi-objective models based on gradient boosting and a local search algorithm to find a trade-off between execution time and energy consumption. We evaluate the GPOEO by applying it to 71 machine learning workloads from two AI benchmark suites running on an NVIDIA RTX3080Ti GPU. Compared with the NVIDIA default scheduling strategy, GPOEO delivers a mean energy saving of 16.2% with a modest average execution time increase of 5.1%.
Although considerable progress has been made in semantic scene understanding under clear weather, it is still a tough problem under adverse weather conditions, such as dense fog, due to the uncertainty caused by imperfect observations. Besides, difficulties in collecting and labeling foggy images hinder the progress of this field. Considering the success in semantic scene understanding under clear weather, we think it is reasonable to transfer knowledge learned from clear images to the foggy domain. As such, the problem becomes to bridge the domain gap between clear images and foggy images. Unlike previous methods that mainly focus on closing the domain gap caused by fog -- defogging the foggy images or fogging the clear images, we propose to alleviate the domain gap by considering fog influence and style variation simultaneously. The motivation is based on our finding that the style-related gap and the fog-related gap can be divided and closed respectively, by adding an intermediate domain. Thus, we propose a new pipeline to cumulatively adapt style, fog and the dual-factor (style and fog). Specifically, we devise a unified framework to disentangle the style factor and the fog factor separately, and then the dual-factor from images in different domains. Furthermore, we collaborate the disentanglement of three factors with a novel cumulative loss to thoroughly disentangle these three factors. Our method achieves the state-of-the-art performance on three benchmarks and shows generalization ability in rainy and snowy scenes.
Sparse representation-based classification (SRC) has attracted much attention by casting the recognition problem as simple linear regression problem. SRC methods, however, still is limited to enough labeled samples per category, insufficient use of unlabeled samples, and instability of representation. For tackling these problems, an unlabeled data driven inverse projection pseudo-full-space representation-based classification model is proposed with low-rank sparse constraints. The proposed model aims to mine the hidden semantic information and intrinsic structure information of all available data, which is suitable for few labeled samples and proportion imbalance between labeled samples and unlabeled samples problems in frontal face recognition. The mixed Gauss-Seidel and Jacobian ADMM algorithm is introduced to solve the model. The convergence, representation capability and stability of the model are analyzed. Experiments on three public datasets show that the proposed LR-S-PFSRC model achieves stable results, especially for proportion imbalance of samples.
We propose a nonparametric factorization approach for sparsely observed tensors. The sparsity does not mean zero-valued entries are massive or dominated. Rather, it implies the observed entries are very few, and even fewer with the growth of the tensor; this is ubiquitous in practice. Compared with the existent works, our model not only leverages the structural information underlying the observed entry indices, but also provides extra interpretability and flexibility -- it can simultaneously estimate a set of location factors about the intrinsic properties of the tensor nodes, and another set of sociability factors reflecting their extrovert activity in interacting with others; users are free to choose a trade-off between the two types of factors. Specifically, we use hierarchical Gamma processes and Poisson random measures to construct a tensor-valued process, which can freely sample the two types of factors to generate tensors and always guarantees an asymptotic sparsity. We then normalize the tensor process to obtain hierarchical Dirichlet processes to sample each observed entry index, and use a Gaussian process to sample the entry value as a nonlinear function of the factors, so as to capture both the sparse structure properties and complex node relationships. For efficient inference, we use Dirichlet process properties over finite sample partitions, density transformations, and random features to develop a stochastic variational estimation algorithm. We demonstrate the advantage of our method in several benchmark datasets.
Soft actuators have shown great advantages in compliance and morphology matched for manipulation of delicate objects and inspection in a confined space. There is an unmet need for a soft actuator that can provide torsional motion to e.g. enlarge working space and increase degrees of freedom. Towards this goal, we present origami-inspired soft pneumatic actuators (OSPAs) made from silicone. The prototype can output a rotation of more than one revolution (up to 435{\deg}), larger than previous counterparts. We describe the design and fabrication method, build the kinematics models and simulation models, and analyze and optimize the parameters. Finally, we demonstrate the potentially extensive utility of OSPAs through their integration into a gripper capable of simultaneously grasping and lifting fragile or flat objects, a versatile robot arm capable of picking and placing items at the right angle with the twisting actuators, and a soft snake robot capable of changing attitude and directions by torsion of the twisting actuators.
Graph neural networks (GNNs) are emerging as a powerful technique for modeling graph structures. Due to the sparsity of real-world graph data, GNN performance is limited by extensive sparse matrix multiplication (SpMM) operations involved in computation. While the right sparse matrix storage format varies across input data, existing deep learning frameworks employ a single, static storage format, leaving much room for improvement. This paper investigates how the choice of sparse matrix storage formats affect the GNN performance. We observe that choosing a suitable sparse matrix storage format can significantly improve the GNN training performance, but the right format depends on the input workloads and can change as the GNN iterates over the input graph. We then develop a predictive model to dynamically choose a sparse matrix storage format to be used by a GNN layer based on the input matrices. Our model is first trained offline using training matrix samples, and the trained model can be applied to any input matrix and GNN kernels with SpMM computation. We implement our approach on top of PyTorch and apply it to 5 representative GNN models running on a multi-core CPU using real-life and synthetic datasets. Experimental results show that our approach gives an average speedup of 1.17x (up to 3x) for GNN running time.
Soft grippers are receiving growing attention due to their compliance-based interactive safety and dexterity. Hybrid gripper (soft actuators enhanced by rigid constraints) is a new trend in soft gripper design. With right structural components actuated by soft actuators, they could achieve excellent grasping adaptability and payload, while also being easy to model and control with conventional kinematics. However, existing works were mostly focused on achieving superior payload and perception with simple planar workspaces, resulting in far less dexterity compared with conventional grippers. In this work, we took inspiration from the human Metacarpophalangeal (MCP) joint and proposed a new hybrid gripper design with 8 independent muscles. It was shown that adding the MCP complexity was critical in enabling a range of novel features in the hybrid gripper, including in-hand manipulation, lateral passive compliance, as well as new control modes. A prototype gripper was fabricated and tested on our proprietary dual-arm robot platform with vision guided grasping. With very lightweight pneumatic bellows soft actuators, the gripper could grasp objects over 25 times its own weight with lateral compliance. Using the dual-arm platform, highly anthropomorphic dexterous manipulations were demonstrated using two hybrid grippers, from Tug-of-war on a rigid rod, to passing a soft towel between two grippers using in-hand manipulation. Matching with the novel features and performance specifications of the proposed hybrid gripper, the underlying modeling, actuation, control, and experimental validation details were also presented, offering a promising approach to achieving enhanced dexterity, strength, and compliance in robotic grippers.
Model Agnostic Meta-Learning (MAML) is widely used to find a good initialization for a family of tasks. Despite its success, a critical challenge in MAML is to calculate the gradient w.r.t the initialization of a long training trajectory for the sampled tasks, because the computation graph can rapidly explode and the computational cost is very expensive. To address this problem, we propose Adjoint MAML (A-MAML). We view gradient descent in the inner optimization as the evolution of an Ordinary Differential Equation (ODE). To efficiently compute the gradient of the validation loss w.r.t the initialization, we use the adjoint method to construct a companion, backward ODE. To obtain the gradient w.r.t the initialization, we only need to run the standard ODE solver twice -- one is forward in time that evolves a long trajectory of gradient flow for the sampled task; the other is backward and solves the adjoint ODE. We need not create or expand any intermediate computational graphs, adopt aggressive approximations, or impose proximal regularizers in the training loss. Our approach is cheap, accurate, and adaptable to different trajectory lengths. We demonstrate the advantage of our approach in both synthetic and real-world meta-learning tasks.
Mitotic figure count is an important marker of tumor proliferation and has been shown to be associated with patients' prognosis. Deep learning based mitotic figure detection methods have been utilized to automatically locate the cell in mitosis using hematoxylin \& eosin (H\&E) stained images. However, the model performance deteriorates due to the large variation of color tone and intensity in H\&E images. In this work, we proposed a two stage mitotic figure detection framework by fusing a detector and a deep ensemble classification model. To alleviate the impact of color variation in H\&E images, we utilize both stain normalization and data augmentation, aiding model to learn color irrelevant features. The proposed model obtains an F1 score of 0.7550 on the preliminary testing set released by the MIDOG challenge.