Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zheng Wang

Dynamic GPU Energy Optimization for Machine Learning Training Workloads

Jan 05, 2022
Farui Wang, Weizhe Zhang, Shichao Lai, Meng Hao, Zheng Wang

Figure 1 for Dynamic GPU Energy Optimization for Machine Learning Training Workloads

Figure 2 for Dynamic GPU Energy Optimization for Machine Learning Training Workloads

Figure 3 for Dynamic GPU Energy Optimization for Machine Learning Training Workloads

Figure 4 for Dynamic GPU Energy Optimization for Machine Learning Training Workloads

GPUs are widely used to accelerate the training of machine learning workloads. As modern machine learning models become increasingly larger, they require a longer time to train, leading to higher GPU energy consumption. This paper presents GPOEO, an online GPU energy optimization framework for machine learning training workloads. GPOEO dynamically determines the optimal energy configuration by employing novel techniques for online measurement, multi-objective prediction modeling, and search optimization. To characterize the target workload behavior, GPOEO utilizes GPU performance counters. To reduce the performance counter profiling overhead, it uses an analytical model to detect the training iteration change and only collects performance counter data when an iteration shift is detected. GPOEO employs multi-objective models based on gradient boosting and a local search algorithm to find a trade-off between execution time and energy consumption. We evaluate the GPOEO by applying it to 71 machine learning workloads from two AI benchmark suites running on an NVIDIA RTX3080Ti GPU. Compared with the NVIDIA default scheduling strategy, GPOEO delivers a mean energy saving of 16.2% with a modest average execution time increase of 5.1%.

* Accepted to be published at IEEE Transactions on Parallel and Distributed System (IEEE TPDS)

Via

Access Paper or Ask Questions

Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding

Dec 01, 2021
Xianzheng Ma, Zhixiang Wang, Yacheng Zhan, Yinqiang Zheng, Zheng Wang, Dengxin Dai, Chia-Wen Lin

Figure 1 for Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding

Figure 2 for Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding

Figure 3 for Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding

Figure 4 for Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding

Although considerable progress has been made in semantic scene understanding under clear weather, it is still a tough problem under adverse weather conditions, such as dense fog, due to the uncertainty caused by imperfect observations. Besides, difficulties in collecting and labeling foggy images hinder the progress of this field. Considering the success in semantic scene understanding under clear weather, we think it is reasonable to transfer knowledge learned from clear images to the foggy domain. As such, the problem becomes to bridge the domain gap between clear images and foggy images. Unlike previous methods that mainly focus on closing the domain gap caused by fog -- defogging the foggy images or fogging the clear images, we propose to alleviate the domain gap by considering fog influence and style variation simultaneously. The motivation is based on our finding that the style-related gap and the fog-related gap can be divided and closed respectively, by adding an intermediate domain. Thus, we propose a new pipeline to cumulatively adapt style, fog and the dual-factor (style and fog). Specifically, we devise a unified framework to disentangle the style factor and the fog factor separately, and then the dual-factor from images in different domains. Furthermore, we collaborate the disentanglement of three factors with a novel cumulative loss to thoroughly disentangle these three factors. Our method achieves the state-of-the-art performance on three benchmarks and shows generalization ability in rainy and snowy scenes.

Via

Access Paper or Ask Questions

Stable and Compact Face Recognition via Unlabeled Data Driven Sparse Representation-Based Classification

Nov 04, 2021
Xiaohui Yang, Zheng Wang, Huan Wu, Licheng Jiao, Yiming Xu, Haolin Chen

Figure 1 for Stable and Compact Face Recognition via Unlabeled Data Driven Sparse Representation-Based Classification

Figure 2 for Stable and Compact Face Recognition via Unlabeled Data Driven Sparse Representation-Based Classification

Figure 3 for Stable and Compact Face Recognition via Unlabeled Data Driven Sparse Representation-Based Classification

Figure 4 for Stable and Compact Face Recognition via Unlabeled Data Driven Sparse Representation-Based Classification

Sparse representation-based classification (SRC) has attracted much attention by casting the recognition problem as simple linear regression problem. SRC methods, however, still is limited to enough labeled samples per category, insufficient use of unlabeled samples, and instability of representation. For tackling these problems, an unlabeled data driven inverse projection pseudo-full-space representation-based classification model is proposed with low-rank sparse constraints. The proposed model aims to mine the hidden semantic information and intrinsic structure information of all available data, which is suitable for few labeled samples and proportion imbalance between labeled samples and unlabeled samples problems in frontal face recognition. The mixed Gauss-Seidel and Jacobian ADMM algorithm is introduced to solve the model. The convergence, representation capability and stability of the model are analyzed. Experiments on three public datasets show that the proposed LR-S-PFSRC model achieves stable results, especially for proportion imbalance of samples.

* 43 pages, 10 figures, 3 tables

Via

Access Paper or Ask Questions

Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes

Nov 03, 2021
Conor Tillinghast, Zheng Wang, Shandian Zhe

Figure 1 for Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes

Figure 2 for Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes

Figure 3 for Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes

Figure 4 for Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes

We propose a nonparametric factorization approach for sparsely observed tensors. The sparsity does not mean zero-valued entries are massive or dominated. Rather, it implies the observed entries are very few, and even fewer with the growth of the tensor; this is ubiquitous in practice. Compared with the existent works, our model not only leverages the structural information underlying the observed entry indices, but also provides extra interpretability and flexibility -- it can simultaneously estimate a set of location factors about the intrinsic properties of the tensor nodes, and another set of sociability factors reflecting their extrovert activity in interacting with others; users are free to choose a trade-off between the two types of factors. Specifically, we use hierarchical Gamma processes and Poisson random measures to construct a tensor-valued process, which can freely sample the two types of factors to generate tensors and always guarantees an asymptotic sparsity. We then normalize the tensor process to obtain hierarchical Dirichlet processes to sample each observed entry index, and use a Gaussian process to sample the entry value as a nonlinear function of the factors, so as to capture both the sparse structure properties and complex node relationships. For efficient inference, we use Dirichlet process properties over finite sample partitions, density transformations, and random features to develop a stochastic variational estimation algorithm. We demonstrate the advantage of our method in several benchmark datasets.

* 15 pages, 4 figures

Via

Access Paper or Ask Questions

Origami-inspired soft twisting actuator

Nov 03, 2021
Diancheng Li, Dongliang Fan, Renjie Zhu, Qiaozhi Lei, Yuxuan Liao, Xin Yang, Yang Pan, Zheng Wang, Yang Wu, Sicong Liu, Hongqiang Wang

Figure 1 for Origami-inspired soft twisting actuator

Figure 2 for Origami-inspired soft twisting actuator

Soft actuators have shown great advantages in compliance and morphology matched for manipulation of delicate objects and inspection in a confined space. There is an unmet need for a soft actuator that can provide torsional motion to e.g. enlarge working space and increase degrees of freedom. Towards this goal, we present origami-inspired soft pneumatic actuators (OSPAs) made from silicone. The prototype can output a rotation of more than one revolution (up to 435{\deg}), larger than previous counterparts. We describe the design and fabrication method, build the kinematics models and simulation models, and analyze and optimize the parameters. Finally, we demonstrate the potentially extensive utility of OSPAs through their integration into a gripper capable of simultaneously grasping and lifting fragile or flat objects, a versatile robot arm capable of picking and placing items at the right angle with the twisting actuators, and a soft snake robot capable of changing attitude and directions by torsion of the twisting actuators.

* 9 figures

Via

Access Paper or Ask Questions

Optimizing Sparse Matrix Multiplications for Graph Neural Networks

Oct 30, 2021
Shenghao Qiu, You Liang, Zheng Wang

Figure 1 for Optimizing Sparse Matrix Multiplications for Graph Neural Networks

Figure 2 for Optimizing Sparse Matrix Multiplications for Graph Neural Networks

Figure 3 for Optimizing Sparse Matrix Multiplications for Graph Neural Networks

Figure 4 for Optimizing Sparse Matrix Multiplications for Graph Neural Networks

Graph neural networks (GNNs) are emerging as a powerful technique for modeling graph structures. Due to the sparsity of real-world graph data, GNN performance is limited by extensive sparse matrix multiplication (SpMM) operations involved in computation. While the right sparse matrix storage format varies across input data, existing deep learning frameworks employ a single, static storage format, leaving much room for improvement. This paper investigates how the choice of sparse matrix storage formats affect the GNN performance. We observe that choosing a suitable sparse matrix storage format can significantly improve the GNN training performance, but the right format depends on the input workloads and can change as the GNN iterates over the input graph. We then develop a predictive model to dynamically choose a sparse matrix storage format to be used by a GNN layer based on the input matrices. Our model is first trained offline using training matrix samples, and the trained model can be applied to any input matrix and GNN kernels with SpMM computation. We implement our approach on top of PyTorch and apply it to 5 representative GNN models running on a multi-core CPU using real-life and synthetic datasets. Experimental results show that our approach gives an average speedup of 1.17x (up to 3x) for GNN running time.

Via

Access Paper or Ask Questions

A Soft-Rigid Hybrid Gripper with Lateral Compliance and Dexterous In-hand Manipulation

Oct 19, 2021
Wenpei Zhu, Chenghua Lu, Qule Zheng, Zhonggui Fang, Haichuan Che, Kailuan Tang, Mingchao Zhu, Sicong Liu, Zheng Wang

Figure 1 for A Soft-Rigid Hybrid Gripper with Lateral Compliance and Dexterous In-hand Manipulation

Figure 2 for A Soft-Rigid Hybrid Gripper with Lateral Compliance and Dexterous In-hand Manipulation

Figure 3 for A Soft-Rigid Hybrid Gripper with Lateral Compliance and Dexterous In-hand Manipulation

Figure 4 for A Soft-Rigid Hybrid Gripper with Lateral Compliance and Dexterous In-hand Manipulation

Soft grippers are receiving growing attention due to their compliance-based interactive safety and dexterity. Hybrid gripper (soft actuators enhanced by rigid constraints) is a new trend in soft gripper design. With right structural components actuated by soft actuators, they could achieve excellent grasping adaptability and payload, while also being easy to model and control with conventional kinematics. However, existing works were mostly focused on achieving superior payload and perception with simple planar workspaces, resulting in far less dexterity compared with conventional grippers. In this work, we took inspiration from the human Metacarpophalangeal (MCP) joint and proposed a new hybrid gripper design with 8 independent muscles. It was shown that adding the MCP complexity was critical in enabling a range of novel features in the hybrid gripper, including in-hand manipulation, lateral passive compliance, as well as new control modes. A prototype gripper was fabricated and tested on our proprietary dual-arm robot platform with vision guided grasping. With very lightweight pneumatic bellows soft actuators, the gripper could grasp objects over 25 times its own weight with lateral compliance. Using the dual-arm platform, highly anthropomorphic dexterous manipulations were demonstrated using two hybrid grippers, from Tug-of-war on a rigid rod, to passing a soft towel between two grippers using in-hand manipulation. Matching with the novel features and performance specifications of the proposed hybrid gripper, the underlying modeling, actuation, control, and experimental validation details were also presented, offering a promising approach to achieving enhanced dexterity, strength, and compliance in robotic grippers.

Via

Access Paper or Ask Questions

Meta-Learning with Adjoint Methods

Oct 16, 2021
Shibo Li, Zheng Wang, Akil Narayan, Robert Kirby, Shandian Zhe

Figure 1 for Meta-Learning with Adjoint Methods

Figure 2 for Meta-Learning with Adjoint Methods

Figure 3 for Meta-Learning with Adjoint Methods

Figure 4 for Meta-Learning with Adjoint Methods

Model Agnostic Meta-Learning (MAML) is widely used to find a good initialization for a family of tasks. Despite its success, a critical challenge in MAML is to calculate the gradient w.r.t the initialization of a long training trajectory for the sampled tasks, because the computation graph can rapidly explode and the computational cost is very expensive. To address this problem, we propose Adjoint MAML (A-MAML). We view gradient descent in the inner optimization as the evolution of an Ordinary Differential Equation (ODE). To efficiently compute the gradient of the validation loss w.r.t the initialization, we use the adjoint method to construct a companion, backward ODE. To obtain the gradient w.r.t the initialization, we only need to run the standard ODE solver twice -- one is forward in time that evolves a long trajectory of gradient flow for the sampled task; the other is backward and solves the adjoint ODE. We need not create or expand any intermediate computational graphs, adopt aggressive approximations, or impose proximal regularizers in the training loss. Our approach is cheap, accurate, and adaptable to different trajectory lengths. We demonstrate the advantage of our approach in both synthetic and real-world meta-learning tasks.

Via

Access Paper or Ask Questions

Detecting Mitosis against Domain Shift using a Fused Detector and Deep Ensemble Classification Model for MIDOG Challenge

Aug 31, 2021
Jingtang Liang, Cheng Wang, Yujie Cheng, Zheng Wang, Fang Wang, Liyu Huang, Zhibin Yu, Yubo Wang

Figure 1 for Detecting Mitosis against Domain Shift using a Fused Detector and Deep Ensemble Classification Model for MIDOG Challenge

Figure 2 for Detecting Mitosis against Domain Shift using a Fused Detector and Deep Ensemble Classification Model for MIDOG Challenge

Figure 3 for Detecting Mitosis against Domain Shift using a Fused Detector and Deep Ensemble Classification Model for MIDOG Challenge

Mitotic figure count is an important marker of tumor proliferation and has been shown to be associated with patients' prognosis. Deep learning based mitotic figure detection methods have been utilized to automatically locate the cell in mitosis using hematoxylin \& eosin (H\&E) stained images. However, the model performance deteriorates due to the large variation of color tone and intensity in H\&E images. In this work, we proposed a two stage mitotic figure detection framework by fusing a detector and a deep ensemble classification model. To alleviate the impact of color variation in H\&E images, we utilize both stain normalization and data augmentation, aiding model to learn color irrelevant features. The proposed model obtains an F1 score of 0.7550 on the preliminary testing set released by the MIDOG challenge.

Via

Access Paper or Ask Questions