Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Weiwen Jiang

The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

Feb 23, 2022
Yi Sheng, Junhuan Yang, Yawen Wu, Kevin Mao, Yiyu Shi, Jingtong Hu, Weiwen Jiang, Lei Yang

Figure 1 for The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

Figure 2 for The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

Figure 3 for The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

Figure 4 for The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

Along with the progress of AI democratization, neural networks are being deployed more frequently in edge devices for a wide range of applications. Fairness concerns gradually emerge in many applications, such as face recognition and mobile medical. One fundamental question arises: what will be the fairest neural architecture for edge devices? By examining the existing neural networks, we observe that larger networks typically are fairer. But, edge devices call for smaller neural architectures to meet hardware specifications. To address this challenge, this work proposes a novel Fairness- and Hardware-aware Neural architecture search framework, namely FaHaNa. Coupled with a model freezing approach, FaHaNa can efficiently search for neural networks with balanced fairness and accuracy, while guaranteed to meet hardware specifications. Results show that FaHaNa can identify a series of neural networks with higher fairness and accuracy on a dermatology dataset. Target edge devices, FaHaNa finds a neural architecture with slightly higher accuracy, 5.28x smaller size, 15.14% higher fairness score, compared with MobileNetV2; meanwhile, on Raspberry PI and Odroid XU-4, it achieves 5.75x and 5.79x speedup.

* Accepted by DAC'22

Via

Access Paper or Ask Questions

Automated Architecture Search for Brain-inspired Hyperdimensional Computing

Feb 11, 2022
Junhuan Yang, Yi Sheng, Sizhe Zhang, Ruixuan Wang, Kenneth Foreman, Mikell Paige, Xun Jiao, Weiwen Jiang, Lei Yang

Figure 1 for Automated Architecture Search for Brain-inspired Hyperdimensional Computing

Figure 2 for Automated Architecture Search for Brain-inspired Hyperdimensional Computing

Figure 3 for Automated Architecture Search for Brain-inspired Hyperdimensional Computing

Figure 4 for Automated Architecture Search for Brain-inspired Hyperdimensional Computing

This paper represents the first effort to explore an automated architecture search for hyperdimensional computing (HDC), a type of brain-inspired neural network. Currently, HDC design is largely carried out in an application-specific ad-hoc manner, which significantly limits its application. Furthermore, the approach leads to inferior accuracy and efficiency, which suggests that HDC cannot perform competitively against deep neural networks. Herein, we present a thorough study to formulate an HDC architecture search space. On top of the search space, we apply reinforcement-learning to automatically explore the HDC architectures. The searched HDC architectures show competitive performance on case studies involving a drug discovery dataset and a language recognition task. On the Clintox dataset, which tries to learn features from developed drugs that passed/failed clinical trials for toxicity reasons, the searched HDC architecture obtains the state-of-the-art ROC-AUC scores, which are 0.80% higher than the manually designed HDC and 9.75% higher than conventional neural networks. Similar results are achieved on the language recognition task, with 1.27% higher performance than conventional methods.

Via

Access Paper or Ask Questions

One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

Nov 03, 2021
Bingqian Lu, Jianyi Yang, Weiwen Jiang, Yiyu Shi, Shaolei Ren

Figure 1 for One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

Figure 2 for One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

Figure 3 for One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

Figure 4 for One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity -- the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device. GitHub: https://github.com/Ren-Research/OneProxy

* Proc. ACM Meas. Anal. Comput. Syst., vol. 5, no. 3, Article 34, December 2021
* Accepted by the ACM SIGMETRICS 2022. Published in the Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 5, no. 3, Article 34, December 2021. GitHub: https://github.com/Ren-Research/OneProxy

Via

Access Paper or Ask Questions

RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions

Oct 30, 2021
Sung-En Chang, Yanyu Li, Mengshu Sun, Weiwen Jiang, Sijia Liu, Yanzhi Wang, Xue Lin

Figure 1 for RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions

Figure 2 for RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions

Figure 3 for RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions

Figure 4 for RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions

This work proposes a novel Deep Neural Network (DNN) quantization framework, namely RMSMP, with a Row-wise Mixed-Scheme and Multi-Precision approach. Specifically, this is the first effort to assign mixed quantization schemes and multiple precisions within layers -- among rows of the DNN weight matrix, for simplified operations in hardware inference, while preserving accuracy. Furthermore, this paper makes a different observation from the prior work that the quantization error does not necessarily exhibit the layer-wise sensitivity, and actually can be mitigated as long as a certain portion of the weights in every layer are in higher precisions. This observation enables layer-wise uniformality in the hardware implementation towards guaranteed inference acceleration, while still enjoying row-wise flexibility of mixed schemes and multiple precisions to boost accuracy. The candidates of schemes and precisions are derived practically and effectively with a highly hardware-informative strategy to reduce the problem search space. With the offline determined ratio of different quantization schemes and precisions for all the layers, the RMSMP quantization algorithm uses the Hessian and variance-based method to effectively assign schemes and precisions for each row. The proposed RMSMP is tested for the image classification and natural language processing (BERT) applications and achieves the best accuracy performance among state-of-the-arts under the same equivalent precisions. The RMSMP is implemented on FPGA devices, achieving 3.65x speedup in the end-to-end inference time for ResNet-18 on ImageNet, compared with the 4-bit Fixed-point baseline.

* Accepted by International Conference on Computer Vision 2021 (ICCV 2021)

Via

Access Paper or Ask Questions

Detecting Gender Bias in Transformer-based Models: A Case Study on BERT

Oct 15, 2021
Bingbing Li, Hongwu Peng, Rajat Sainju, Junhuan Yang, Lei Yang, Yueying Liang, Weiwen Jiang, Binghui Wang, Hang Liu, Caiwen Ding

Figure 1 for Detecting Gender Bias in Transformer-based Models: A Case Study on BERT

Figure 2 for Detecting Gender Bias in Transformer-based Models: A Case Study on BERT

Figure 3 for Detecting Gender Bias in Transformer-based Models: A Case Study on BERT

Figure 4 for Detecting Gender Bias in Transformer-based Models: A Case Study on BERT

In this paper, we propose a novel gender bias detection method by utilizing attention map for transformer-based models. We 1) give an intuitive gender bias judgement method by comparing the different relation degree between the genders and the occupation according to the attention scores, 2) design a gender bias detector by modifying the attention module, 3) insert the gender bias detector into different positions of the model to present the internal gender bias flow, and 4) draw the consistent gender bias conclusion by scanning the entire Wikipedia, a BERT pretraining dataset. We observe that 1) the attention matrices, Wq and Wk introduce much more gender bias than other modules (including the embedding layer) and 2) the bias degree changes periodically inside of the model (attention matrix Q, K, V, and the remaining part of the attention layer (including the fully-connected layer, the residual connection, and the layer normalization module) enhance the gender bias while the averaged attentions reduces the bias).

Via

Access Paper or Ask Questions

RADARS: Memory Efficient Reinforcement Learning Aided Differentiable Neural Architecture Search

Sep 13, 2021
Zheyu Yan, Weiwen Jiang, Xiaobo Sharon Hu, Yiyu Shi

Figure 1 for RADARS: Memory Efficient Reinforcement Learning Aided Differentiable Neural Architecture Search

Figure 2 for RADARS: Memory Efficient Reinforcement Learning Aided Differentiable Neural Architecture Search

Figure 3 for RADARS: Memory Efficient Reinforcement Learning Aided Differentiable Neural Architecture Search

Figure 4 for RADARS: Memory Efficient Reinforcement Learning Aided Differentiable Neural Architecture Search

Differentiable neural architecture search (DNAS) is known for its capacity in the automatic generation of superior neural networks. However, DNAS based methods suffer from memory usage explosion when the search space expands, which may prevent them from running successfully on even advanced GPU platforms. On the other hand, reinforcement learning (RL) based methods, while being memory efficient, are extremely time-consuming. Combining the advantages of both types of methods, this paper presents RADARS, a scalable RL-aided DNAS framework that can explore large search spaces in a fast and memory-efficient manner. RADARS iteratively applies RL to prune undesired architecture candidates and identifies a promising subspace to carry out DNAS. Experiments using a workstation with 12 GB GPU memory show that on CIFAR-10 and ImageNet datasets, RADARS can achieve up to 3.41% higher accuracy with 2.5X search time reduction compared with a state-of-the-art RL-based method, while the two DNAS baselines cannot complete due to excessive memory usage or search time. To the best of the authors' knowledge, this is the first DNAS framework that can handle large search spaces with bounded memory usage.

Via

Access Paper or Ask Questions

Exploration of Quantum Neural Architecture by Mixing Quantum Neuron Designs

Sep 08, 2021
Zhepeng Wang, Zhiding Liang, Shanglin Zhou, Caiwen Ding, Jinjun Xiong, Yiyu Shi, Weiwen Jiang

Figure 1 for Exploration of Quantum Neural Architecture by Mixing Quantum Neuron Designs

Figure 2 for Exploration of Quantum Neural Architecture by Mixing Quantum Neuron Designs

Figure 3 for Exploration of Quantum Neural Architecture by Mixing Quantum Neuron Designs

Figure 4 for Exploration of Quantum Neural Architecture by Mixing Quantum Neuron Designs

With the constant increase of the number of quantum bits (qubits) in the actual quantum computers, implementing and accelerating the prevalent deep learning on quantum computers are becoming possible. Along with this trend, there emerge quantum neural architectures based on different designs of quantum neurons. A fundamental question in quantum deep learning arises: what is the best quantum neural architecture? Inspired by the design of neural architectures for classical computing which typically employs multiple types of neurons, this paper makes the very first attempt to mix quantum neuron designs to build quantum neural architectures. We observe that the existing quantum neuron designs may be quite different but complementary, such as neurons from variation quantum circuits (VQC) and Quantumflow. More specifically, VQC can apply real-valued weights but suffer from being extended to multiple layers, while QuantumFlow can build a multi-layer network efficiently, but is limited to use binary weights. To take their respective advantages, we propose to mix them together and figure out a way to connect them seamlessly without additional costly measurement. We further investigate the design principles to mix quantum neurons, which can provide guidance for quantum neural architecture exploration in the future. Experimental results demonstrate that the identified quantum neural architectures with mixed quantum neurons can achieve 90.62% of accuracy on the MNIST dataset, compared with 52.77% and 69.92% on the VQC and QuantumFlow, respectively.

Via

Access Paper or Ask Questions

Can Noise on Qubits Be Learned in Quantum Neural Network? A Case Study on QuantumFlow

Sep 08, 2021
Zhiding Liang, Zhepeng Wang, Junhuan Yang, Lei Yang, Jinjun Xiong, Yiyu Shi, Weiwen Jiang

Figure 1 for Can Noise on Qubits Be Learned in Quantum Neural Network? A Case Study on QuantumFlow

Figure 2 for Can Noise on Qubits Be Learned in Quantum Neural Network? A Case Study on QuantumFlow

Figure 3 for Can Noise on Qubits Be Learned in Quantum Neural Network? A Case Study on QuantumFlow

Figure 4 for Can Noise on Qubits Be Learned in Quantum Neural Network? A Case Study on QuantumFlow

In the noisy intermediate-scale quantum (NISQ) era, one of the key questions is how to deal with the high noise level existing in physical quantum bits (qubits). Quantum error correction is promising but requires an extensive number (e.g., over 1,000) of physical qubits to create one "perfect" qubit, exceeding the capacity of the existing quantum computers. This paper aims to tackle the noise issue from another angle: instead of creating perfect qubits for general quantum algorithms, we investigate the potential to mitigate the noise issue for dedicate algorithms. Specifically, this paper targets quantum neural network (QNN), and proposes to learn the errors in the training phase, so that the identified QNN model can be resilient to noise. As a result, the implementation of QNN needs no or a small number of additional physical qubits, which is more realistic for the near-term quantum computers. To achieve this goal, an application-specific compiler is essential: on the one hand, the error cannot be learned if the mapping from logical qubits to physical qubits exists randomness; on the other hand, the compiler needs to be efficient so that the lengthy training procedure can be completed in a reasonable time. In this paper, we utilize the recent QNN framework, QuantumFlow, as a case study. Experimental results show that the proposed approach can optimize QNN models for different errors in qubits, achieving up to 28% accuracy improvement compared with the model obtained by the error-agnostic training.

Via

Access Paper or Ask Questions

A Compression-Compilation Framework for On-mobile Real-time BERT Applications

Jun 06, 2021
Wei Niu, Zhenglun Kong, Geng Yuan, Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang

Figure 1 for A Compression-Compilation Framework for On-mobile Real-time BERT Applications

Figure 2 for A Compression-Compilation Framework for On-mobile Real-time BERT Applications

Figure 3 for A Compression-Compilation Framework for On-mobile Real-time BERT Applications

Figure 4 for A Compression-Compilation Framework for On-mobile Real-time BERT Applications

Transformer-based deep learning models have increasingly demonstrated high accuracy on many natural language processing (NLP) tasks. In this paper, we propose a compression-compilation co-design framework that can guarantee the identified model to meet both resource and real-time specifications of mobile devices. Our framework applies a compiler-aware neural architecture optimization method (CANAO), which can generate the optimal compressed model that balances both accuracy and latency. We are able to achieve up to 7.8x speedup compared with TensorFlow-Lite with only minor accuracy loss. We present two types of BERT applications on mobile devices: Question Answering (QA) and Text Generation. Both can be executed in real-time with latency as low as 45ms. Videos for demonstrating the framework can be found on https://www.youtube.com/watch?v=_WIRvK_2PZI

* arXiv admin note: substantial text overlap with arXiv:2009.06823

Via

Access Paper or Ask Questions