Recent research has shown that it is challenging to detect out-of-distribution (OOD) data in deep generative models including flow-based models and variational autoencoders (VAEs). In this paper, we prove a theorem that, for a well-trained flow-based model, the distance between the distribution of representations of an OOD dataset and prior can be large enough, as long as the distance between the distributions of the training dataset and the OOD dataset is large enough. Furthermore, our observation shows that, for flow-based model and VAE with factorized prior, the representations of OOD datasets are more correlated than that of the training dataset. Based on our theorem and observation, we propose detecting OOD data according to the total correlation of representations in flow-based model and VAE. Experimental results show that our method can achieve nearly 100\% AUROC for all the widely used benchmarks and has robustness against data manipulation. While the state-of-the-art method performs not better than random guessing for challenging problems and can be fooled by data manipulation in almost all cases.
In recent years, with the trend of applying deep learning (DL) in high performance scientific computing, the unique characteristics of emerging DL workloads in HPC raise great challenges in designing, implementing HPC AI systems. The community needs a new yard stick for evaluating the future HPC systems. In this paper, we propose HPC AI500 --- a benchmark suite for evaluating HPC systems that running scientific DL workloads. Covering the most representative scientific fields, each workload from HPC AI500 is based on real-world scientific DL applications. Currently, we choose 14 scientific DL benchmarks from perspectives of application scenarios, data sets, and software stack. We propose a set of metrics for comprehensively evaluating the HPC AI systems, considering both accuracy, performance as well as power and cost. We provide a scalable reference implementation of HPC AI500. HPC AI500 is a part of the open-source AIBench project, the specification and source code are publicly available from \url{http://www.benchcouncil.org/AIBench/index.html}.
In this paper, we propose a Distributed Intelligent Video Surveillance (DIVS) system using Deep Learning (DL) algorithms and deploy it in an edge computing environment. We establish a multi-layer edge computing architecture and a distributed DL training model for the DIVS system. The DIVS system can migrate computing workloads from the network center to network edges to reduce huge network communication overhead and provide low-latency and accurate video analysis solutions. We implement the proposed DIVS system and address the problems of parallel training, model synchronization, and workload balancing. Task-level parallel and model-level parallel training methods are proposed to further accelerate the video analysis process. In addition, we propose a model parameter updating method to achieve model synchronization of the global DL model in a distributed EC environment. Moreover, a dynamic data migration approach is proposed to address the imbalance of workload and computational power of edge nodes. Experimental results showed that the EC architecture can provide elastic and scalable computing power, and the proposed DIVS system can efficiently handle video surveillance and analysis tasks.
In the era of big data, practical applications in various domains continually generate large-scale time-series data. Among them, some data show significant or potential periodicity characteristics, such as meteorological and financial data. It is critical to efficiently identify the potential periodic patterns from massive time-series data and provide accurate predictions. In this paper, a Periodicity-based Parallel Time Series Prediction (PPTSP) algorithm for large-scale time-series data is proposed and implemented in the Apache Spark cloud computing environment. To effectively handle the massive historical datasets, a Time Series Data Compression and Abstraction (TSDCA) algorithm is presented, which can reduce the data scale as well as accurately extracting the characteristics. Based on this, we propose a Multi-layer Time Series Periodic Pattern Recognition (MTSPPR) algorithm using the Fourier Spectrum Analysis (FSA) method. In addition, a Periodicity-based Time Series Prediction (PTSP) algorithm is proposed. Data in the subsequent period are predicted based on all previous period models, in which a time attenuation factor is introduced to control the impact of different periods on the prediction results. Moreover, to improve the performance of the proposed algorithms, we propose a parallel solution on the Apache Spark platform, using the Streaming real-time computing module. To efficiently process the large-scale time-series datasets in distributed computing environments, Distributed Streams (DStreams) and Resilient Distributed Datasets (RDDs) are used to store and calculate these datasets. Extensive experimental results show that our PPTSP algorithm has significant advantages compared with other algorithms in terms of prediction accuracy and performance.
It is crucial to provide compatible treatment schemes for a disease according to various symptoms at different stages. However, most classification methods might be ineffective in accurately classifying a disease that holds the characteristics of multiple treatment stages, various symptoms, and multi-pathogenesis. Moreover, there are limited exchanges and cooperative actions in disease diagnoses and treatments between different departments and hospitals. Thus, when new diseases occur with atypical symptoms, inexperienced doctors might have difficulty in identifying them promptly and accurately. Therefore, to maximize the utilization of the advanced medical technology of developed hospitals and the rich medical knowledge of experienced doctors, a Disease Diagnosis and Treatment Recommendation System (DDTRS) is proposed in this paper. First, to effectively identify disease symptoms more accurately, a Density-Peaked Clustering Analysis (DPCA) algorithm is introduced for disease-symptom clustering. In addition, association analyses on Disease-Diagnosis (D-D) rules and Disease-Treatment (D-T) rules are conducted by the Apriori algorithm separately. The appropriate diagnosis and treatment schemes are recommended for patients and inexperienced doctors, even if they are in a limited therapeutic environment. Moreover, to reach the goals of high performance and low latency response, we implement a parallel solution for DDTRS using the Apache Spark cloud platform. Extensive experimental results demonstrate that the proposed DDTRS realizes disease-symptom clustering effectively and derives disease treatment recommendations intelligently and accurately.
Benefitting from large-scale training datasets and the complex training network, Convolutional Neural Networks (CNNs) are widely applied in various fields with high accuracy. However, the training process of CNNs is very time-consuming, where large amounts of training samples and iterative operations are required to obtain high-quality weight parameters. In this paper, we focus on the time-consuming training process of large-scale CNNs and propose a Bi-layered Parallel Training (BPT-CNN) architecture in distributed computing environments. BPT-CNN consists of two main components: (a) an outer-layer parallel training for multiple CNN subnetworks on separate data subsets, and (b) an inner-layer parallel training for each subnetwork. In the outer-layer parallelism, we address critical issues of distributed and parallel computing, including data communication, synchronization, and workload balance. A heterogeneous-aware Incremental Data Partitioning and Allocation (IDPA) strategy is proposed, where large-scale training datasets are partitioned and allocated to the computing nodes in batches according to their computing power. To minimize the synchronization waiting during the global weight update process, an Asynchronous Global Weight Update (AGWU) strategy is proposed. In the inner-layer parallelism, we further accelerate the training process for each CNN subnetwork on each computer, where computation steps of convolutional layer and the local weight training are parallelized based on task-parallelism. We introduce task decomposition and scheduling strategies with the objectives of thread-level load balancing and minimum waiting time for critical paths. Extensive experimental results indicate that the proposed BPT-CNN effectively improves the training performance of CNNs while maintaining the accuracy.
Face multi-attribute prediction benefits substantially from multi-task learning (MTL), which learns multiple face attributes simultaneously to achieve shared or mutually related representations of different attributes. The most widely used MTL convolutional neural network is heuristically or empirically designed by sharing all of the convolutional layers and splitting at the fully connected layers for task-specific losses. However, it is improper to view all low and mid-level features for different attributes as being the same, especially when these attributes are only loosely related. In this paper, we propose a novel multi-attribute tensor correlation neural network (MTCN) for face attribute prediction. The structure shares the information in low-level features (e.g., the first two convolutional layers) but splits that in high-level features (e.g., from the third convolutional layer to the fully connected layer). At the same time, during high-level feature extraction, each subnetwork (e.g., Age-Net, Gender-Net, ..., and Smile-Net) excavates closely related features from other networks to enhance its features. Then, we project the features of the C9 layers of the fine-tuned subnetworks into a highly correlated space by using a novel tensor correlation analysis algorithm (NTCCA). The final face attribute prediction is made based on the correlation matrix. Experimental results on benchmarks with multiple face attributes (CelebA and LFWA) show that the proposed approach has superior performance compared to state-of-the-art methods.