Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jeremy Kepner

AI Accelerator Survey and Trends

Sep 18, 2021

Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner

Figure 1 for AI Accelerator Survey and Trends

Figure 2 for AI Accelerator Survey and Trends

Figure 3 for AI Accelerator Survey and Trends

Figure 4 for AI Accelerator Survey and Trends

Abstract:Over the past several years, new machine learning accelerators were being announced and released every month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications. This paper updates the survey of AI accelerators and processors from past two years. This paper collects and summarizes the current commercial accelerators that have been publicly announced with peak performance and power consumption numbers. The performance and power values are plotted on a scatter graph, and a number of dimensions and observations from the trends on this plot are again discussed and analyzed. This year, we also compile a list of benchmarking performance results and compute the computational efficiency with respect to peak performance.

* 9 pages, 2 figures, IEEE High Performance Extreme Computing Conference 2021

Via

Access Paper or Ask Questions

Maneuver Identification Challenge

Aug 25, 2021

Kaira Samuel, Vijay Gadepally, David Jacobs, Michael Jones, Kyle McAlpin, Kyle Palko, Ben Paulk, Sid Samsi, Ho Chit Siu, Charles Yee(+1 more)

Figure 1 for Maneuver Identification Challenge

Figure 2 for Maneuver Identification Challenge

Figure 3 for Maneuver Identification Challenge

Figure 4 for Maneuver Identification Challenge

Abstract:AI algorithms that identify maneuvers from trajectory data could play an important role in improving flight safety and pilot training. AI challenges allow diverse teams to work together to solve hard problems and are an effective tool for developing AI solutions. AI challenges are also a key driver of AI computational requirements. The Maneuver Identification Challenge hosted at maneuver-id.mit.edu provides thousands of trajectories collected from pilots practicing in flight simulators, descriptions of maneuvers, and examples of these maneuvers performed by experienced pilots. Each trajectory consists of positions, velocities, and aircraft orientations normalized to a common coordinate system. Construction of the data set required significant data architecture to transform flight simulator logs into AI ready data, which included using a supercomputer for deduplication and data conditioning. There are three proposed challenges. The first challenge is separating physically plausible (good) trajectories from unfeasible (bad) trajectories. Human labeled good and bad trajectories are provided to aid in this task. Subsequent challenges are to label trajectories with their intended maneuvers and to assess the quality of those maneuvers.

* 7 pages, 8 figures, 1 table, 33 references, accepted to IEEE HPEC 2021

Via

Access Paper or Ask Questions

The MIT Supercloud Dataset

Aug 04, 2021

Siddharth Samsi, Matthew L Weiss, David Bestor, Baolin Li, Michael Jones, Albert Reuther, Daniel Edelman, William Arcand, Chansup Byun, John Holodnack(+17 more)

Abstract:Artificial intelligence (AI) and Machine learning (ML) workloads are an increasingly larger share of the compute workloads in traditional High-Performance Computing (HPC) centers and commercial cloud systems. This has led to changes in deployment approaches of HPC clusters and the commercial cloud, as well as a new focus on approaches to optimized resource usage, allocations and deployment of new AI frame- works, and capabilities such as Jupyter notebooks to enable rapid prototyping and deployment. With these changes, there is a need to better understand cluster/datacenter operations with the goal of developing improved scheduling policies, identifying inefficiencies in resource utilization, energy/power consumption, failure prediction, and identifying policy violations. In this paper we introduce the MIT Supercloud Dataset which aims to foster innovative AI/ML approaches to the analysis of large scale HPC and datacenter/cloud operations. We provide detailed monitoring logs from the MIT Supercloud system, which include CPU and GPU usage by jobs, memory usage, file system logs, and physical monitoring data. This paper discusses the details of the dataset, collection methodology, data availability, and discusses potential challenge problems being developed using this data. Datasets and future challenge announcements will be available via https://dcc.mit.edu.

Via

Access Paper or Ask Questions

Mathematics of Digital Hyperspace

Mar 28, 2021

Jeremy Kepner, Timothy Davis, Vijay Gadepally, Hayden Jananthan, Lauren Milechin

Figure 1 for Mathematics of Digital Hyperspace

Figure 2 for Mathematics of Digital Hyperspace

Figure 3 for Mathematics of Digital Hyperspace

Figure 4 for Mathematics of Digital Hyperspace

Abstract:Social media, e-commerce, streaming video, e-mail, cloud documents, web pages, traffic flows, and network packets fill vast digital lakes, rivers, and oceans that we each navigate daily. This digital hyperspace is an amorphous flow of data supported by continuous streams that stretch standard concepts of type and dimension. The unstructured data of digital hyperspace can be elegantly represented, traversed, and transformed via the mathematics of hypergraphs, hypersparse matrices, and associative array algebra. This paper explores a novel mathematical concept, the semilink, that combines pairs of semirings to provide the essential operations for graph analytics, database operations, and machine learning. The GraphBLAS standard currently supports hypergraphs, hypersparse matrices, the mathematics required for semilinks, and seamlessly performs graph, network, and matrix operations. With the addition of key based indices (such as pointers to strings) and semilinks, GraphBLAS can become a richer associative array algebra and be a plug-in replacement for spreadsheets, database tables, and data centric operating systems, enhancing the navigation of unstructured data found in digital hyperspace.

* 9 pages, 8 figures, 2 tables, accepted to GrAPL 2021. arXiv admin note: text overlap with arXiv:1807.03165, arXiv:2004.01181, arXiv:1909.05631, arXiv:1708.02937

Via

Access Paper or Ask Questions

Survey of Machine Learning Accelerators

Sep 01, 2020

Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner

Figure 1 for Survey of Machine Learning Accelerators

Figure 2 for Survey of Machine Learning Accelerators

Abstract:New machine learning accelerators are being announced and released each month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications. This paper updates the survey of of AI accelerators and processors from last year's IEEE-HPEC paper. This paper collects and summarizes the current accelerators that have been publicly announced with performance and power consumption numbers. The performance and power values are plotted on a scatter graph and a number of dimensions and observations from the trends on this plot are discussed and analyzed. For instance, there are interesting trends in the plot regarding power consumption, numerical precision, and inference versus training. This year, there are many more announced accelerators that are implemented with many more architectures and technologies from vector engines, dataflow engines, neuromorphic designs, flash-based analog memory processing, and photonic-based processing.

* 12 pages, 2 figures, IEEE-HPEC conference, Waltham, MA, September 21-25, 2020. arXiv admin note: text overlap with arXiv:1908.11348

Via

Access Paper or Ask Questions

Accuracy and Performance Comparison of Video Action Recognition Approaches

Aug 20, 2020

Matthew Hutchinson, Siddharth Samsi, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Micheal Houle, Matthew Hubbell, Micheal Jones, Jeremy Kepner(+9 more)

Figure 1 for Accuracy and Performance Comparison of Video Action Recognition Approaches

Figure 2 for Accuracy and Performance Comparison of Video Action Recognition Approaches

Figure 3 for Accuracy and Performance Comparison of Video Action Recognition Approaches

Figure 4 for Accuracy and Performance Comparison of Video Action Recognition Approaches

Abstract:Over the past few years, there has been significant interest in video action recognition systems and models. However, direct comparison of accuracy and computational performance results remain clouded by differing training environments, hardware specifications, hyperparameters, pipelines, and inference methods. This article provides a direct comparison between fourteen off-the-shelf and state-of-the-art models by ensuring consistency in these training characteristics in order to provide readers with a meaningful comparison across different types of video action recognition algorithms. Accuracy of the models is evaluated using standard Top-1 and Top-5 accuracy metrics in addition to a proposed new accuracy metric. Additionally, we compare computational performance of distributed training from two to sixty-four GPUs on a state-of-the-art HPC system.

* Accepted for publication at IEEE HPEC 2020

Via

Access Paper or Ask Questions

Benchmarking network fabrics for data distributed training of deep neural networks

Aug 18, 2020

Siddharth Samsi, Andrew Prout, Michael Jones, Andrew Kirby, Bill Arcand, Bill Bergeron, David Bestor, Chansup Byun, Vijay Gadepally, Michael Houle(+9 more)

Figure 1 for Benchmarking network fabrics for data distributed training of deep neural networks

Figure 2 for Benchmarking network fabrics for data distributed training of deep neural networks

Figure 3 for Benchmarking network fabrics for data distributed training of deep neural networks

Figure 4 for Benchmarking network fabrics for data distributed training of deep neural networks

Abstract:Artificial Intelligence/Machine Learning applications require the training of complex models on large amounts of labelled data. The large computational requirements for training deep models have necessitated the development of new methods for faster training. One such approach is the data parallel approach, where the training data is distributed across multiple compute nodes. This approach is simple to implement and supported by most of the commonly used machine learning frameworks. The data parallel approach leverages MPI for communicating gradients across all nodes. In this paper, we examine the effects of using different physical hardware interconnects and network-related software primitives for enabling data distributed deep learning. We compare the effect of using GPUDirect and NCCL on Ethernet and OmniPath fabrics. Our results show that using Ethernet-based networking in shared HPC systems does not have a significant effect on the training times for commonly used deep neural network architectures or traditional HPC applications such as Computational Fluid Dynamics.

* Accepted for publication at IEEE HPEC 2020

Via

Access Paper or Ask Questions

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks Via Nonlinear Multigrid

Jul 14, 2020

Andrew C. Kirby, Siddharth Samsi, Michael Jones, Albert Reuther, Jeremy Kepner, Vijay Gadepally

Figure 1 for Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks Via Nonlinear Multigrid

Figure 2 for Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks Via Nonlinear Multigrid

Figure 3 for Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks Via Nonlinear Multigrid

Figure 4 for Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks Via Nonlinear Multigrid

Abstract:A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs. This work demonstrates a 10.2x speedup over traditional layer-wise model parallelism techniques using the same number of compute units.

* 7 pages, 6 figures, submitted to 2020 IEEE High Performance Extreme Computing Conference

Via

Access Paper or Ask Questions

GraphChallenge.org Sparse Deep Neural Network Performance

Apr 06, 2020

Jeremy Kepner, Simon Alford, Vijay Gadepally, Michael Jones, Lauren Milechin, Albert Reuther, Ryan Robinett, Sid Samsi

Figure 1 for GraphChallenge.org Sparse Deep Neural Network Performance

Figure 2 for GraphChallenge.org Sparse Deep Neural Network Performance

Figure 3 for GraphChallenge.org Sparse Deep Neural Network Performance

Figure 4 for GraphChallenge.org Sparse Deep Neural Network Performance

Abstract:The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The Sparse Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a challenge that is reflective of emerging sparse AI systems. The sparse DNN challenge is based on a mathematically well-defined DNN inference computation and can be implemented in any programming environment. In 2019 several sparse DNN challenge submissions were received from a wide range of authors and organizations. This paper presents a performance analysis of the best performers of these submissions. These submissions show that their state-of-the-art sparse DNN execution time, $T_{\rm DNN}$, is a strong function of the number of DNN operations performed, $N_{\rm op}$. The sparse DNN challenge provides a clear picture of current sparse DNN systems and underscores the need for new innovations to achieve high performance on very large sparse DNNs.

* 7 pages, 7 figures, 80 references, to be submitted to IEEE HPEC 2020. This work reports new updated results on prior work reported in arXiv:1909.05631. arXiv admin note: substantial text overlap with arXiv:1807.03165, arXiv:1708.02937. arXiv admin note: text overlap with arXiv:2003.09269

Via

Access Paper or Ask Questions

Sparse Deep Neural Network Graph Challenge

Sep 02, 2019

Jeremy Kepner, Simon Alford, Vijay Gadepally, Michael Jones, Lauren Milechin, Ryan Robinett, Sid Samsi

Figure 1 for Sparse Deep Neural Network Graph Challenge

Figure 2 for Sparse Deep Neural Network Graph Challenge

Figure 3 for Sparse Deep Neural Network Graph Challenge

Figure 4 for Sparse Deep Neural Network Graph Challenge

Abstract:The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The proposed Sparse Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a challenge that is reflective of emerging sparse AI systems. The Sparse DNN Challenge is based on a mathematically well-defined DNN inference computation and can be implemented in any programming environment. Sparse DNN inference is amenable to both vertex-centric implementations and array-based implementations (e.g., using the GraphBLAS.org standard). The computations are simple enough that performance predictions can be made based on simple computing hardware models. The input data sets are derived from the MNIST handwritten letters. The surrounding I/O and verification provide the context for each sparse DNN inference that allows rigorous definition of both the input and the output. Furthermore, since the proposed sparse DNN challenge is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Reference implementations have been implemented and their serial and parallel performance have been measured. Specifications, data, and software are publicly available at GraphChallenge.org

* 7 pages, 5 figures, 3 tables, 60 references, accepted to IEEE HPEC 2019. arXiv admin note: substantial text overlap with arXiv:1807.03165, arXiv:1708.02937, arXiv:1708.06866

Via

Access Paper or Ask Questions