Alert button
Picture for Wen-mei Hwu

Wen-mei Hwu

Alert button

The Design and Implementation of a Scalable DL Benchmarking Platform

Add code
Bookmark button
Alert button
Nov 19, 2019
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu

Figure 1 for The Design and Implementation of a Scalable DL Benchmarking Platform
Figure 2 for The Design and Implementation of a Scalable DL Benchmarking Platform
Figure 3 for The Design and Implementation of a Scalable DL Benchmarking Platform
Figure 4 for The Design and Implementation of a Scalable DL Benchmarking Platform
Viaarxiv icon

Benanza: Automatic $μ$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs

Add code
Bookmark button
Alert button
Nov 19, 2019
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu

Figure 1 for Benanza: Automatic $μ$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
Figure 2 for Benanza: Automatic $μ$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
Figure 3 for Benanza: Automatic $μ$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
Figure 4 for Benanza: Automatic $μ$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
Viaarxiv icon

DLBricks: Composable Benchmark Generation toReduce Deep Learning Benchmarking Effort on CPUs

Add code
Bookmark button
Alert button
Nov 18, 2019
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu

Figure 1 for DLBricks: Composable Benchmark Generation toReduce Deep Learning Benchmarking Effort on CPUs
Figure 2 for DLBricks: Composable Benchmark Generation toReduce Deep Learning Benchmarking Effort on CPUs
Figure 3 for DLBricks: Composable Benchmark Generation toReduce Deep Learning Benchmarking Effort on CPUs
Figure 4 for DLBricks: Composable Benchmark Generation toReduce Deep Learning Benchmarking Effort on CPUs
Viaarxiv icon

NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving

Add code
Bookmark button
Alert button
Nov 18, 2019
Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, Jinjun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen

Figure 1 for NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving
Figure 2 for NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving
Figure 3 for NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving
Figure 4 for NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving
Viaarxiv icon

Benanza: Automatic uBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs

Add code
Bookmark button
Alert button
Nov 16, 2019
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu

Figure 1 for Benanza: Automatic uBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
Figure 2 for Benanza: Automatic uBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
Figure 3 for Benanza: Automatic uBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
Figure 4 for Benanza: Automatic uBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
Viaarxiv icon

SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems

Add code
Bookmark button
Alert button
Sep 20, 2019
Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, Jinjun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen

Figure 1 for SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems
Figure 2 for SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems
Figure 3 for SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems
Figure 4 for SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems
Viaarxiv icon

Across-Stack Profiling and Characterization of Machine Learning Models on GPUs

Add code
Bookmark button
Alert button
Aug 19, 2019
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wei Wei, Lingjie Xu, Wen-mei Hwu

Figure 1 for Across-Stack Profiling and Characterization of Machine Learning Models on GPUs
Figure 2 for Across-Stack Profiling and Characterization of Machine Learning Models on GPUs
Figure 3 for Across-Stack Profiling and Characterization of Machine Learning Models on GPUs
Figure 4 for Across-Stack Profiling and Characterization of Machine Learning Models on GPUs
Viaarxiv icon

SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection

Add code
Bookmark button
Alert button
Jul 09, 2019
Xiaofan Zhang, Cong Hao, Haoming Lu, Jiachen Li, Yuhong Li, Yuchen Fan, Kyle Rupnow, Jinjun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen

Figure 1 for SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection
Figure 2 for SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection
Figure 3 for SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection
Figure 4 for SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection
Viaarxiv icon

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices

Add code
Bookmark button
Alert button
May 20, 2019
Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, Jinjun Xiong, Wen-mei Hwu, Deming Chen

Figure 1 for A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices
Figure 2 for A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices
Figure 3 for A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices
Figure 4 for A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices
Viaarxiv icon