Automated deep neural network architecture design has received a significant amount of recent attention. However, this attention has not been equally shared by one of the fundamental building blocks of a deep neural network, the neurons. In this study, we propose PolyNeuron, a novel automatic neuron discovery approach based on learned polyharmonic spline activations. More specifically, PolyNeuron revolves around learning polyharmonic splines, characterized by a set of control points, that represent the activation functions of the neurons in a deep neural network. A relaxed variant of PolyNeuron, which we term PolyNeuron-R, loosens the constraints imposed by PolyNeuron to reduce the computational complexity for discovering the neuron activation functions in an automated manner. Experiments show both PolyNeuron and PolyNeuron-R lead to networks that have improved or comparable performance on multiple network architectures (LeNet-5 and ResNet-20) using different datasets (MNIST and CIFAR10). As such, automatic neuron discovery approaches such as PolyNeuron is a worthy direction to explore.
Supervised dimensionality reduction strategies have been of great interest. However, current supervised dimensionality reduction approaches are difficult to scale for situations characterized by large datasets given the high computational complexities associated with such methods. While stochastic approximation strategies have been explored for unsupervised dimensionality reduction to tackle this challenge, such approaches are not well-suited for accelerating computational speed for supervised dimensionality reduction. Motivated to tackle this challenge, in this study we explore a novel direction of directly learning optimal class-aware embeddings in a supervised manner via the notion of supervised random projections (SRP). The key idea behind SRP is that, rather than performing spectral decomposition (or approximations thereof) which are computationally prohibitive for large-scale data, we instead perform a direct decomposition by leveraging kernel approximation theory and the symmetry of the Hilbert-Schmidt Independence Criterion (HSIC) measure of dependence between the embedded data and the labels. Experimental results on five different synthetic and real-world datasets demonstrate that the proposed SRP strategy for class-aware embedding learning can be very promising in producing embeddings that are highly competitive with existing supervised dimensionality reduction methods (e.g., SPCA and KSPCA) while achieving 1-2 orders of magnitude better computational performance. As such, such an efficient approach to learning embeddings for dimensionality reduction can be a powerful tool for large-scale data analysis and visualization.
The current trade-off between depth and computational cost makes it difficult to adopt deep neural networks for many industrial applications, especially when computing power is limited. Here, we are inspired by the idea that, while deeper embeddings are needed to discriminate difficult samples, a large number of samples can be well discriminated via much shallower embeddings. In this study, we introduce the concept of decision gates (d-gate), modules trained to decide whether a sample needs to be projected into a deeper embedding or if an early prediction can be made at the d-gate, thus enabling the computation of dynamic representations at different depths. The proposed d-gate modules can be integrated with any deep neural network and reduces the average computational cost of the deep neural networks while maintaining modeling accuracy. Experimental results show that leveraging the proposed d-gate modules led to a ~38% speed-up and ~39% FLOPS reduction on ResNet-101 and ~46% speed-up and ~36% FLOPS reduction on DenseNet-201 trained on the CIFAR10 dataset with only ~2\% drop in accuracy.
Under the impact of global climate changes and human activities, harmful algae blooms in surface waters have become a growing concern due to negative impacts on water related industries. Therefore, reliable and cost effective methods of quantifying the type and concentration of threshold levels of algae cells has become critical for ensuring successful water management. In this work, we present SAMSON, an innovative system to automatically classify multiple types of algae from different phyla groups by combining standard morphological features with their multi-wavelength signals. Two phyla with focused investigation in this study are the Cyanophyta phylum (blue-green algae), and the Chlorophyta phylum (green algae). We use a custom-designed microscopy imaging system which is configured to image water samples at two fluorescent wavelengths and seven absorption wavelengths using discrete-wavelength high-powered light emitting diodes (LEDs). Powered by computer vision and machine learning, we investigate the possibility and effectiveness of automatic classification using a deep residual convolutional neural network. More specifically, a classification accuracy of 96% was achieved in an experiment conducted with six different algae types. This high level of accuracy was achieved using a deep residual convolutional neural network that learns the optimal combination of spectral and morphological features. These findings elude to the possibility of leveraging a unique fingerprint of algae cell (i.e. spectral wavelengths and morphological features) to automatically distinguish different algae types. Our work herein demonstrates that, when coupled with multi-band fluorescence microscopy, machine learning algorithms can potentially be used as a robust and cost-effective tool for identifying and enumerating algae cells.
Despite showing state-of-the-art performance, deep learning for speech recognition remains challenging to deploy in on-device edge scenarios such as mobile and other consumer devices. Recently, there have been greater efforts in the design of small, low-footprint deep neural networks (DNNs) that are more appropriate for edge devices, with much of the focus on design principles for hand-crafting efficient network architectures. In this study, we explore a human-machine collaborative design strategy for building low-footprint DNN architectures for speech recognition through a marriage of human-driven principled network design prototyping and machine-driven design exploration. The efficacy of this design strategy is demonstrated through the design of a family of highly-efficient DNNs (nicknamed EdgeSpeechNets) for limited-vocabulary speech recognition. Experimental results using the Google Speech Commands dataset for limited-vocabulary speech recognition showed that EdgeSpeechNets have higher accuracies than state-of-the-art DNNs (with the best EdgeSpeechNet achieving ~97% accuracy), while achieving significantly smaller network sizes (as much as 7.8x smaller) and lower computational cost (as much as 36x fewer multiply-add operations, 10x lower prediction latency, and 16x smaller memory footprint on a Motorola Moto E phone), making them very well-suited for on-device edge voice interface applications.
Traffic sign recognition is a very important computer vision task for a number of real-world applications such as intelligent transportation surveillance and analysis. While deep neural networks have been demonstrated in recent years to provide state-of-the-art performance traffic sign recognition, a key challenge for enabling the widespread deployment of deep neural networks for embedded traffic sign recognition is the high computational and memory requirements of such networks. As a consequence, there are significant benefits in investigating compact deep neural network architectures for traffic sign recognition that are better suited for embedded devices. In this paper, we introduce MicronNet, a highly compact deep convolutional neural network for real-time embedded traffic sign recognition designed based on macroarchitecture design principles (e.g., spectral macroarchitecture augmentation, parameter precision optimization, etc.) as well as numerical microarchitecture optimization strategies. The resulting overall architecture of MicronNet is thus designed with as few parameters and computations as possible while maintaining recognition performance, leading to optimized information density of the proposed network. The resulting MicronNet possesses a model size of just ~1MB and ~510,000 parameters (~27x fewer parameters than state-of-the-art) while still achieving a human performance level top-1 accuracy of 98.9% on the German traffic sign recognition benchmark. Furthermore, MicronNet requires just ~10 million multiply-accumulate operations to perform inference, and has a time-to-compute of just 32.19 ms on a Cortex-A53 high efficiency processor. These experimental results show that highly compact, optimized deep neural network architectures can be designed for real-time traffic sign recognition that are well-suited for embedded scenarios.
The tremendous potential exhibited by deep learning is often offset by architectural and computational complexity, making widespread deployment a challenge for edge scenarios such as mobile and other consumer devices. To tackle this challenge, we explore the following idea: Can we learn generative machines to automatically generate deep neural networks with efficient network architectures? In this study, we introduce the idea of generative synthesis, which is premised on the intricate interplay between a generator-inquisitor pair that work in tandem to garner insights and learn to generate highly efficient deep neural networks that best satisfies operational requirements. What is most interesting is that, once a generator has been learned through generative synthesis, it can be used to generate not just one but a large variety of different, unique highly efficient deep neural networks that satisfy operational requirements. Experimental results for image classification, semantic segmentation, and object detection tasks illustrate the efficacy of generative synthesis in producing generators that automatically generate highly efficient deep neural networks (which we nickname FermiNets) with higher model efficiency and lower computational costs (reaching >10x more efficient and fewer multiply-accumulate operations than several tested state-of-the-art networks), as well as higher energy efficiency (reaching >4x improvements in image inferences per joule consumed on a Nvidia Tegra X2 mobile processor). As such, generative synthesis can be a powerful, generalized approach for accelerating and improving the building of deep neural networks for on-device edge scenarios.
Much of the focus in the design of deep neural networks has been on improving accuracy, leading to more powerful yet highly complex network architectures that are difficult to deploy in practical scenarios, particularly on edge devices such as mobile and other consumer devices given their high computational and memory requirements. As a result, there has been a recent interest in the design of quantitative metrics for evaluating deep neural networks that accounts for more than just model accuracy as the sole indicator of network performance. In this study, we continue the conversation towards universal metrics for evaluating the performance of deep neural networks for practical on-device edge usage. In particular, we propose a new balanced metric called NetScore, which is designed specifically to provide a quantitative assessment of the balance between accuracy, computational complexity, and network architecture complexity of a deep neural network, which is important for on-device edge operation. In what is one of the largest comparative analysis between deep neural networks in literature, the NetScore metric, the top-1 accuracy metric, and the popular information density metric were compared across a diverse set of 60 different deep convolutional neural networks for image classification on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2012) dataset. The evaluation results across these three metrics for this diverse set of networks are presented in this study to act as a reference guide for practitioners in the field. The proposed NetScore metric, along with the other tested metrics, are by no means perfect, but the hope is to push the conversation towards better universal metrics for evaluating deep neural networks for use in practical on-device edge scenarios to help guide practitioners in model design for such scenarios.
While microscopic analysis of histopathological slides is generally considered as the gold standard method for performing cancer diagnosis and grading, the current method for analysis is extremely time consuming and labour intensive as it requires pathologists to visually inspect tissue samples in a detailed fashion for the presence of cancer. As such, there has been significant recent interest in computer aided diagnosis systems for analysing histopathological slides for cancer grading to aid pathologists to perform cancer diagnosis and grading in a more efficient, accurate, and consistent manner. In this work, we investigate and explore a deep triple-stream residual network (TriResNet) architecture for the purpose of tile-level histopathology grading, which is the critical first step to computer-aided whole-slide histopathology grading. In particular, the design mentality behind the proposed TriResNet network architecture is to facilitate for the learning of a more diverse set of quantitative features to better characterize the complex tissue characteristics found in histopathology samples. Experimental results on two widely-used computer-aided histopathology benchmark datasets (CAMELYON16 dataset and Invasive Ductal Carcinoma (IDC) dataset) demonstrated that the proposed TriResNet network architecture was able to achieve noticeably improved accuracies when compared with two other state-of-the-art deep convolutional neural network architectures. Based on these promising results, the hope is that the proposed TriResNet network architecture could become a useful tool to aiding pathologists increase the consistency, speed, and accuracy of the histopathology grading process.
Vehicle Make and Model Recognition (MMR) systems provide a fully automatic framework to recognize and classify different vehicle models. Several approaches have been proposed to address this challenge, however they can perform in restricted conditions. Here, we formulate the vehicle make and model recognition as a fine-grained classification problem and propose a new configurable on-road vehicle make and model recognition framework. We benefit from the unsupervised feature learning methods and in more details we employ Locality constraint Linear Coding (LLC) method as a fast feature encoder for encoding the input SIFT features. The proposed method can perform in real environments of different conditions. This framework can recognize fifty models of vehicles and has an advantage to classify every other vehicle not belonging to one of the specified fifty classes as an unknown vehicle. The proposed MMR framework can be configured to become faster or more accurate based on the application domain. The proposed approach is examined on two datasets including Iranian on-road vehicle dataset and CompuCar dataset. The Iranian on-road vehicle dataset contains images of 50 models of vehicles captured in real situations by traffic cameras in different weather and lighting conditions. Experimental results show superiority of the proposed framework over the state-of-the-art methods on Iranian on-road vehicle datatset and comparable results on CompuCar dataset with 97.5% and 98.4% accuracies, respectively.