Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vijay Gadepally

Mathematics of Digital Hyperspace

Mar 28, 2021

Jeremy Kepner, Timothy Davis, Vijay Gadepally, Hayden Jananthan, Lauren Milechin

Figure 1 for Mathematics of Digital Hyperspace

Figure 2 for Mathematics of Digital Hyperspace

Figure 3 for Mathematics of Digital Hyperspace

Figure 4 for Mathematics of Digital Hyperspace

Abstract:Social media, e-commerce, streaming video, e-mail, cloud documents, web pages, traffic flows, and network packets fill vast digital lakes, rivers, and oceans that we each navigate daily. This digital hyperspace is an amorphous flow of data supported by continuous streams that stretch standard concepts of type and dimension. The unstructured data of digital hyperspace can be elegantly represented, traversed, and transformed via the mathematics of hypergraphs, hypersparse matrices, and associative array algebra. This paper explores a novel mathematical concept, the semilink, that combines pairs of semirings to provide the essential operations for graph analytics, database operations, and machine learning. The GraphBLAS standard currently supports hypergraphs, hypersparse matrices, the mathematics required for semilinks, and seamlessly performs graph, network, and matrix operations. With the addition of key based indices (such as pointers to strings) and semilinks, GraphBLAS can become a richer associative array algebra and be a plug-in replacement for spreadsheets, database tables, and data centric operating systems, enhancing the navigation of unstructured data found in digital hyperspace.

* 9 pages, 8 figures, 2 tables, accepted to GrAPL 2021. arXiv admin note: text overlap with arXiv:1807.03165, arXiv:2004.01181, arXiv:1909.05631, arXiv:1708.02937

Via

Access Paper or Ask Questions

Technical Report on Data Integration and Preparation

Mar 02, 2021

El Kindi Rezig, Michael Cafarella, Vijay Gadepally

Figure 1 for Technical Report on Data Integration and Preparation

Figure 2 for Technical Report on Data Integration and Preparation

Figure 3 for Technical Report on Data Integration and Preparation

Figure 4 for Technical Report on Data Integration and Preparation

Abstract:AI application developers typically begin with a dataset of interest and a vision of the end analytic or insight they wish to gain from the data at hand. Although these are two very important components of an AI workflow, one often spends the first few weeks (sometimes months) in the phase we refer to as data conditioning. This step typically includes tasks such as figuring out how to prepare data for analytics, dealing with inconsistencies in the dataset, and determining which algorithm (or set of algorithms) will be best suited for the application. Larger, faster, and messier datasets such as those from Internet of Things sensors, medical devices or autonomous vehicles only amplify these issues. These challenges, often referred to as the three Vs (volume, velocity, variety) of Big Data, require low-level tools for data management, preparation and integration. In most applications, data can come from structured and/or unstructured sources and often includes inconsistencies, formatting differences, and a lack of ground-truth labels. In this report, we highlight a number of tools that can be used to simplify data integration and preparation steps. Specifically, we focus on data integration tools and techniques, a deep dive into an exemplar data integration tool, and a deep-dive in the evolving field of knowledge graphs. Finally, we provide readers with a list of practical steps and considerations that they can use to simplify the data integration challenge. The goal of this report is to provide readers with a view of state-of-the-art as well as practical tips that can be used by data creators that make data integration more seamless.

Via

Access Paper or Ask Questions

Video Action Understanding: A Tutorial

Oct 13, 2020

Matthew Hutchinson, Vijay Gadepally

Figure 1 for Video Action Understanding: A Tutorial

Figure 2 for Video Action Understanding: A Tutorial

Figure 3 for Video Action Understanding: A Tutorial

Figure 4 for Video Action Understanding: A Tutorial

Abstract:Many believe that the successes of deep learning on image understanding problems can be replicated in the realm of video understanding. However, the span of video action problems and the set of proposed deep learning solutions is arguably wider and more diverse than those of their 2D image siblings. Finding, identifying, and predicting actions are a few of the most salient tasks in video action understanding. This tutorial clarifies a taxonomy of video action problems, highlights datasets and metrics used to baseline each problem, describes common data preparation methods, and presents the building blocks of state-of-the-art deep learning model architectures.

* Shortened version submitted to ACM CSUR

Via

Access Paper or Ask Questions

Survey of Machine Learning Accelerators

Sep 01, 2020

Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner

Figure 1 for Survey of Machine Learning Accelerators

Figure 2 for Survey of Machine Learning Accelerators

Abstract:New machine learning accelerators are being announced and released each month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications. This paper updates the survey of of AI accelerators and processors from last year's IEEE-HPEC paper. This paper collects and summarizes the current accelerators that have been publicly announced with performance and power consumption numbers. The performance and power values are plotted on a scatter graph and a number of dimensions and observations from the trends on this plot are discussed and analyzed. For instance, there are interesting trends in the plot regarding power consumption, numerical precision, and inference versus training. This year, there are many more announced accelerators that are implemented with many more architectures and technologies from vector engines, dataflow engines, neuromorphic designs, flash-based analog memory processing, and photonic-based processing.

* 12 pages, 2 figures, IEEE-HPEC conference, Waltham, MA, September 21-25, 2020. arXiv admin note: text overlap with arXiv:1908.11348

Via

Access Paper or Ask Questions

Accuracy and Performance Comparison of Video Action Recognition Approaches

Aug 20, 2020

Matthew Hutchinson, Siddharth Samsi, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Micheal Houle, Matthew Hubbell, Micheal Jones, Jeremy Kepner(+9 more)

Figure 1 for Accuracy and Performance Comparison of Video Action Recognition Approaches

Figure 2 for Accuracy and Performance Comparison of Video Action Recognition Approaches

Figure 3 for Accuracy and Performance Comparison of Video Action Recognition Approaches

Figure 4 for Accuracy and Performance Comparison of Video Action Recognition Approaches

Abstract:Over the past few years, there has been significant interest in video action recognition systems and models. However, direct comparison of accuracy and computational performance results remain clouded by differing training environments, hardware specifications, hyperparameters, pipelines, and inference methods. This article provides a direct comparison between fourteen off-the-shelf and state-of-the-art models by ensuring consistency in these training characteristics in order to provide readers with a meaningful comparison across different types of video action recognition algorithms. Accuracy of the models is evaluated using standard Top-1 and Top-5 accuracy metrics in addition to a proposed new accuracy metric. Additionally, we compare computational performance of distributed training from two to sixty-four GPUs on a state-of-the-art HPC system.

* Accepted for publication at IEEE HPEC 2020

Via

Access Paper or Ask Questions

Benchmarking network fabrics for data distributed training of deep neural networks

Aug 18, 2020

Siddharth Samsi, Andrew Prout, Michael Jones, Andrew Kirby, Bill Arcand, Bill Bergeron, David Bestor, Chansup Byun, Vijay Gadepally, Michael Houle(+9 more)

Figure 1 for Benchmarking network fabrics for data distributed training of deep neural networks

Figure 2 for Benchmarking network fabrics for data distributed training of deep neural networks

Figure 3 for Benchmarking network fabrics for data distributed training of deep neural networks

Figure 4 for Benchmarking network fabrics for data distributed training of deep neural networks

Abstract:Artificial Intelligence/Machine Learning applications require the training of complex models on large amounts of labelled data. The large computational requirements for training deep models have necessitated the development of new methods for faster training. One such approach is the data parallel approach, where the training data is distributed across multiple compute nodes. This approach is simple to implement and supported by most of the commonly used machine learning frameworks. The data parallel approach leverages MPI for communicating gradients across all nodes. In this paper, we examine the effects of using different physical hardware interconnects and network-related software primitives for enabling data distributed deep learning. We compare the effect of using GPUDirect and NCCL on Ethernet and OmniPath fabrics. Our results show that using Ethernet-based networking in shared HPC systems does not have a significant effect on the training times for commonly used deep neural network architectures or traditional HPC applications such as Computational Fluid Dynamics.

* Accepted for publication at IEEE HPEC 2020

Via

Access Paper or Ask Questions

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks Via Nonlinear Multigrid

Jul 14, 2020

Andrew C. Kirby, Siddharth Samsi, Michael Jones, Albert Reuther, Jeremy Kepner, Vijay Gadepally

Figure 1 for Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks Via Nonlinear Multigrid

Figure 2 for Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks Via Nonlinear Multigrid

Figure 3 for Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks Via Nonlinear Multigrid

Figure 4 for Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks Via Nonlinear Multigrid

Abstract:A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs. This work demonstrates a 10.2x speedup over traditional layer-wise model parallelism techniques using the same number of compute units.

* 7 pages, 6 figures, submitted to 2020 IEEE High Performance Extreme Computing Conference

Via

Access Paper or Ask Questions

GraphChallenge.org Sparse Deep Neural Network Performance

Apr 06, 2020

Jeremy Kepner, Simon Alford, Vijay Gadepally, Michael Jones, Lauren Milechin, Albert Reuther, Ryan Robinett, Sid Samsi

Figure 1 for GraphChallenge.org Sparse Deep Neural Network Performance

Figure 2 for GraphChallenge.org Sparse Deep Neural Network Performance

Figure 3 for GraphChallenge.org Sparse Deep Neural Network Performance

Figure 4 for GraphChallenge.org Sparse Deep Neural Network Performance

Abstract:The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The Sparse Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a challenge that is reflective of emerging sparse AI systems. The sparse DNN challenge is based on a mathematically well-defined DNN inference computation and can be implemented in any programming environment. In 2019 several sparse DNN challenge submissions were received from a wide range of authors and organizations. This paper presents a performance analysis of the best performers of these submissions. These submissions show that their state-of-the-art sparse DNN execution time, $T_{\rm DNN}$, is a strong function of the number of DNN operations performed, $N_{\rm op}$. The sparse DNN challenge provides a clear picture of current sparse DNN systems and underscores the need for new innovations to achieve high performance on very large sparse DNNs.

* 7 pages, 7 figures, 80 references, to be submitted to IEEE HPEC 2020. This work reports new updated results on prior work reported in arXiv:1909.05631. arXiv admin note: substantial text overlap with arXiv:1807.03165, arXiv:1708.02937. arXiv admin note: text overlap with arXiv:2003.09269

Via

Access Paper or Ask Questions

Survey of Attacks and Defenses on Edge-Deployed Neural Networks

Nov 27, 2019

Mihailo Isakov, Vijay Gadepally, Karen M. Gettings, Michel A. Kinsy

Figure 1 for Survey of Attacks and Defenses on Edge-Deployed Neural Networks

Abstract:Deep Neural Network (DNN) workloads are quickly moving from datacenters onto edge devices, for latency, privacy, or energy reasons. While datacenter networks can be protected using conventional cybersecurity measures, edge neural networks bring a host of new security challenges. Unlike classic IoT applications, edge neural networks are typically very compute and memory intensive, their execution is data-independent, and they are robust to noise and faults. Neural network models may be very expensive to develop, and can potentially reveal information about the private data they were trained on, requiring special care in distribution. The hidden states and outputs of the network can also be used in reconstructing user inputs, potentially violating users' privacy. Furthermore, neural networks are vulnerable to adversarial attacks, which may cause misclassifications and violate the integrity of the output. These properties add challenges when securing edge-deployed DNNs, requiring new considerations, threat models, priorities, and approaches in securely and privately deploying DNNs to the edge. In this work, we cover the landscape of attacks on, and defenses, of neural networks deployed in edge devices and provide a taxonomy of attacks and defenses targeting edge DNNs.

* In the 2019 IEEE High Performance Extreme Computing Conference (HPEC), 2019

Via

Access Paper or Ask Questions

Sparse Deep Neural Network Graph Challenge

Sep 02, 2019

Jeremy Kepner, Simon Alford, Vijay Gadepally, Michael Jones, Lauren Milechin, Ryan Robinett, Sid Samsi

Figure 1 for Sparse Deep Neural Network Graph Challenge

Figure 2 for Sparse Deep Neural Network Graph Challenge

Figure 3 for Sparse Deep Neural Network Graph Challenge

Figure 4 for Sparse Deep Neural Network Graph Challenge

Abstract:The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The proposed Sparse Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a challenge that is reflective of emerging sparse AI systems. The Sparse DNN Challenge is based on a mathematically well-defined DNN inference computation and can be implemented in any programming environment. Sparse DNN inference is amenable to both vertex-centric implementations and array-based implementations (e.g., using the GraphBLAS.org standard). The computations are simple enough that performance predictions can be made based on simple computing hardware models. The input data sets are derived from the MNIST handwritten letters. The surrounding I/O and verification provide the context for each sparse DNN inference that allows rigorous definition of both the input and the output. Furthermore, since the proposed sparse DNN challenge is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Reference implementations have been implemented and their serial and parallel performance have been measured. Specifications, data, and software are publicly available at GraphChallenge.org

* 7 pages, 5 figures, 3 tables, 60 references, accepted to IEEE HPEC 2019. arXiv admin note: substantial text overlap with arXiv:1807.03165, arXiv:1708.02937, arXiv:1708.06866

Via

Access Paper or Ask Questions