Alert button
Picture for Zhenglun Kong

Zhenglun Kong

Alert button

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

Oct 26, 2021
Geng Yuan, Xiaolong Ma, Wei Niu, Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing Jin, Siyue Wang, Minghai Qin, Bin Ren, Yanzhi Wang, Sijia Liu, Xue Lin

Figure 1 for MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge
Figure 2 for MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge
Figure 3 for MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge
Figure 4 for MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge
Viaarxiv icon

Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization

Oct 19, 2021
Panjie Qi, Edwin Hsing-Mean Sha, Qingfeng Zhuge, Hongwu Peng, Shaoyi Huang, Zhenglun Kong, Yuhong Song, Bingbing Li

Figure 1 for Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization
Figure 2 for Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization
Figure 3 for Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization
Figure 4 for Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization
Viaarxiv icon

Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI

Jun 18, 2021
Geng Yuan, Zhiheng Liao, Xiaolong Ma, Yuxuan Cai, Zhenglun Kong, Xuan Shen, Jingyan Fu, Zhengang Li, Chengming Zhang, Hongwu Peng, Ning Liu, Ao Ren, Jinhui Wang, Yanzhi Wang

Figure 1 for Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI
Figure 2 for Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI
Figure 3 for Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI
Figure 4 for Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI
Viaarxiv icon

A Compression-Compilation Framework for On-mobile Real-time BERT Applications

Jun 06, 2021
Wei Niu, Zhenglun Kong, Geng Yuan, Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang

Figure 1 for A Compression-Compilation Framework for On-mobile Real-time BERT Applications
Figure 2 for A Compression-Compilation Framework for On-mobile Real-time BERT Applications
Figure 3 for A Compression-Compilation Framework for On-mobile Real-time BERT Applications
Figure 4 for A Compression-Compilation Framework for On-mobile Real-time BERT Applications
Viaarxiv icon

6.7ms on Mobile with over 78% ImageNet Accuracy: Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration

Dec 01, 2020
Zhengang Li, Geng Yuan, Wei Niu, Yanyu Li, Pu Zhao, Yuxuan Cai, Xuan Shen, Zheng Zhan, Zhenglun Kong, Qing Jin, Zhiyu Chen, Sijia Liu, Kaiyuan Yang, Bin Ren, Yanzhi Wang, Xue Lin

Figure 1 for 6.7ms on Mobile with over 78% ImageNet Accuracy: Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration
Figure 2 for 6.7ms on Mobile with over 78% ImageNet Accuracy: Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration
Figure 3 for 6.7ms on Mobile with over 78% ImageNet Accuracy: Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration
Figure 4 for 6.7ms on Mobile with over 78% ImageNet Accuracy: Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration
Viaarxiv icon

Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning

Oct 08, 2020
Bingbing Li, Zhenglun Kong, Tianyun Zhang, Ji Li, Zhengang Li, Hang Liu, Caiwen Ding

Figure 1 for Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning
Figure 2 for Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning
Figure 3 for Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning
Figure 4 for Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning
Viaarxiv icon

Achieving Real-Time Execution of Transformer-based Large-scale Models on Mobile with Compiler-aware Neural Architecture Optimization

Sep 15, 2020
Wei Niu, Zhenglun Kong, Geng Yuan, Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang

Figure 1 for Achieving Real-Time Execution of Transformer-based Large-scale Models on Mobile with Compiler-aware Neural Architecture Optimization
Figure 2 for Achieving Real-Time Execution of Transformer-based Large-scale Models on Mobile with Compiler-aware Neural Architecture Optimization
Figure 3 for Achieving Real-Time Execution of Transformer-based Large-scale Models on Mobile with Compiler-aware Neural Architecture Optimization
Figure 4 for Achieving Real-Time Execution of Transformer-based Large-scale Models on Mobile with Compiler-aware Neural Architecture Optimization
Viaarxiv icon