Alert button
Picture for Cunxi Yu

Cunxi Yu

Alert button

Verilog-to-PyG -- A Framework for Graph Learning and Augmentation on RTL Designs

Nov 09, 2023
Yingjie Li, Mingju Liu, Alan Mishchenko, Cunxi Yu

The complexity of modern hardware designs necessitates advanced methodologies for optimizing and analyzing modern digital systems. In recent times, machine learning (ML) methodologies have emerged as potent instruments for assessing design quality-of-results at the Register-Transfer Level (RTL) or Boolean level, aiming to expedite design exploration of advanced RTL configurations. In this presentation, we introduce an innovative open-source framework that translates RTL designs into graph representation foundations, which can be seamlessly integrated with the PyTorch Geometric graph learning platform. Furthermore, the Verilog-to-PyG (V2PYG) framework is compatible with the open-source Electronic Design Automation (EDA) toolchain OpenROAD, facilitating the collection of labeled datasets in an utterly open-source manner. Additionally, we will present novel RTL data augmentation methods (incorporated in our framework) that enable functional equivalent design augmentation for the construction of an extensive graph-based RTL design database. Lastly, we will showcase several using cases of V2PYG with detailed scripting examples. V2PYG can be found at \url{https://yu-maryland.github.io/Verilog-to-PyG/}.

* 8 pages, International Conference on Computer-Aided Design (ICCAD'23) 
Viaarxiv icon

Accelerating Exact Combinatorial Optimization via RL-based Initialization -- A Case Study in Scheduling

Aug 19, 2023
Jiaqi Yin, Cunxi Yu

Scheduling on dataflow graphs (also known as computation graphs) is an NP-hard problem. The traditional exact methods are limited by runtime complexity, while reinforcement learning (RL) and heuristic-based approaches struggle with determinism and solution quality. This research aims to develop an innovative approach that employs machine learning (ML) for addressing combinatorial optimization problems, using scheduling as a case study. The goal is to provide guarantees in optimality and determinism while maintaining the runtime cost of heuristic methods. Specifically, we introduce a novel two-phase RL-to-ILP scheduling framework, which includes three steps: 1) RL solver acts as coarse-grain scheduler, 2) solution relaxation and 3) exact solving via ILP. Our framework demonstrates the same scheduling performance compared with using exact scheduling methods while achieving up to 128 $\times$ speed improvements. This was conducted on actual EdgeTPU platforms, utilizing ImageNet DNN computation graphs as input. Additionally, the framework offers improved on-chip inference runtime and acceleration compared to the commercially available EdgeTPU compiler.

* International Conference on Computer-Aided Design 2023 (ICCAD) 
Viaarxiv icon

Rubik's Optical Neural Networks: Multi-task Learning with Physics-aware Rotation Architecture

May 02, 2023
Yingjie Li, Weilu Gao, Cunxi Yu

Figure 1 for Rubik's Optical Neural Networks: Multi-task Learning with Physics-aware Rotation Architecture
Figure 2 for Rubik's Optical Neural Networks: Multi-task Learning with Physics-aware Rotation Architecture
Figure 3 for Rubik's Optical Neural Networks: Multi-task Learning with Physics-aware Rotation Architecture
Figure 4 for Rubik's Optical Neural Networks: Multi-task Learning with Physics-aware Rotation Architecture

Recently, there are increasing efforts on advancing optical neural networks (ONNs), which bring significant advantages for machine learning (ML) in terms of power efficiency, parallelism, and computational speed. With the considerable benefits in computation speed and energy efficiency, there are significant interests in leveraging ONNs into medical sensing, security screening, drug detection, and autonomous driving. However, due to the challenge of implementing reconfigurability, deploying multi-task learning (MTL) algorithms on ONNs requires re-building and duplicating the physical diffractive systems, which significantly degrades the energy and cost efficiency in practical application scenarios. This work presents a novel ONNs architecture, namely, \textit{RubikONNs}, which utilizes the physical properties of optical systems to encode multiple feed-forward functions by physically rotating the hardware similarly to rotating a \textit{Rubik's Cube}. To optimize MTL performance on RubikONNs, two domain-specific physics-aware training algorithms \textit{RotAgg} and \textit{RotSeq} are proposed. Our experimental results demonstrate more than 4$\times$ improvements in energy and cost efficiency with marginal accuracy degradation compared to the state-of-the-art approaches.

* To appear at 32nd International Joint Conference on Artificial Intelligence (IJCAI'23) 
Viaarxiv icon

RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral Edge TPUs

Apr 10, 2023
Jiaqi Yin, Yingjie Li, Daniel Robinson, Cunxi Yu

Figure 1 for RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral Edge TPUs
Figure 2 for RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral Edge TPUs
Figure 3 for RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral Edge TPUs
Figure 4 for RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral Edge TPUs

Deep neural networks (DNNs) have substantial computational and memory requirements, and the compilation of its computational graphs has a great impact on the performance of resource-constrained (e.g., computation, I/O, and memory-bound) edge computing systems. While efficient execution of their computational graph requires an effective scheduling algorithm, generating the optimal scheduling solution is a challenging NP-hard problem. Furthermore, the complexity of scheduling DNN computational graphs will further increase on pipelined multi-core systems considering memory communication cost, as well as the increasing size of DNNs. Using the synthetic graph for the training dataset, this work presents a reinforcement learning (RL) based scheduling framework RESPECT, which learns the behaviors of optimal optimization algorithms and generates near-optimal scheduling results with short solving runtime overhead. Our framework has demonstrated up to $\sim2.5\times$ real-world on-chip inference runtime speedups over the commercial compiler with ten popular ImageNet models deployed on the physical Coral Edge TPUs system. Moreover, compared to the exact optimization methods, the proposed RL scheduling improves the scheduling optimization runtime by up to 683$\times$ speedups compared to the commercial compiler and matches the exact optimal solutions with up to 930$\times$ speedups. Finally, we perform a comprehensive generalizability test, which demonstrates RESPECT successfully imitates optimal solving behaviors from small synthetic graphs to large real-world DNNs computational graphs.

* 6 pages, ACM/IEEE Design Automation Conference (DAC'23) 
Viaarxiv icon

Physics-aware Roughness Optimization for Diffractive Optical Neural Networks

Apr 04, 2023
Shanglin Zhou, Yingjie Li, Minhan Lou, Weilu Gao, Zhijie Shi, Cunxi Yu, Caiwen Ding

Figure 1 for Physics-aware Roughness Optimization for Diffractive Optical Neural Networks
Figure 2 for Physics-aware Roughness Optimization for Diffractive Optical Neural Networks
Figure 3 for Physics-aware Roughness Optimization for Diffractive Optical Neural Networks
Figure 4 for Physics-aware Roughness Optimization for Diffractive Optical Neural Networks

As a representative next-generation device/circuit technology beyond CMOS, diffractive optical neural networks (DONNs) have shown promising advantages over conventional deep neural networks due to extreme fast computation speed (light speed) and low energy consumption. However, there is a mismatch, i.e., significant prediction accuracy loss, between the DONN numerical modelling and physical optical device deployment, because of the interpixel interaction within the diffractive layers. In this work, we propose a physics-aware diffractive optical neural network training framework to reduce the performance difference between numerical modeling and practical deployment. Specifically, we propose the roughness modeling regularization in the training process and integrate the physics-aware sparsification method to introduce sparsity to the phase masks to reduce sharp phase changes between adjacent pixels in diffractive layers. We further develop $2\pi$ periodic optimization to reduce the roughness of the phase masks to preserve the performance of DONN. Experiment results demonstrate that, compared to state-of-the-arts, our physics-aware optimization can provide $35.7\%$, $34.2\%$, $28.1\%$, and $27.3\%$ reduction in roughness with only accuracy loss on MNIST, FMNIST, KMNIST, and EMNIST, respectively.

* This paper is accepted by the Design Automation Conference (DAC), 2023 
Viaarxiv icon

Physics-aware Differentiable Discrete Codesign for Diffractive Optical Neural Networks

Sep 28, 2022
Yingjie Li, Ruiyang Chen, Weilu Gao, Cunxi Yu

Figure 1 for Physics-aware Differentiable Discrete Codesign for Diffractive Optical Neural Networks
Figure 2 for Physics-aware Differentiable Discrete Codesign for Diffractive Optical Neural Networks
Figure 3 for Physics-aware Differentiable Discrete Codesign for Diffractive Optical Neural Networks
Figure 4 for Physics-aware Differentiable Discrete Codesign for Diffractive Optical Neural Networks

Diffractive optical neural networks (DONNs) have attracted lots of attention as they bring significant advantages in terms of power efficiency, parallelism, and computational speed compared with conventional deep neural networks (DNNs), which have intrinsic limitations when implemented on digital platforms. However, inversely mapping algorithm-trained physical model parameters onto real-world optical devices with discrete values is a non-trivial task as existing optical devices have non-unified discrete levels and non-monotonic properties. This work proposes a novel device-to-system hardware-software codesign framework, which enables efficient physics-aware training of DONNs w.r.t arbitrary experimental measured optical devices across layers. Specifically, Gumbel-Softmax is employed to enable differentiable discrete mapping from real-world device parameters into the forward function of DONNs, where the physical parameters in DONNs can be trained by simply minimizing the loss function of the ML task. The results have demonstrated that our proposed framework offers significant advantages over conventional quantization-based methods, especially with low-precision optical devices. Finally, the proposed algorithm is fully verified with physical experimental optical systems in low-precision settings.

* International Conference on Computer-Aided Design (ICCAD'2022) To appear 
Viaarxiv icon

Multi-Task Learning in Diffractive Deep Neural Networks via Hardware-Software Co-design

Dec 16, 2020
Yingjie Li, Ruiyang Chen, Berardi Sensale Rodriguez, Weilu Gao, Cunxi Yu

Figure 1 for Multi-Task Learning in Diffractive Deep Neural Networks via Hardware-Software Co-design
Figure 2 for Multi-Task Learning in Diffractive Deep Neural Networks via Hardware-Software Co-design
Figure 3 for Multi-Task Learning in Diffractive Deep Neural Networks via Hardware-Software Co-design
Figure 4 for Multi-Task Learning in Diffractive Deep Neural Networks via Hardware-Software Co-design

Deep neural networks (DNNs) have substantial computational requirements, which greatly limit their performance in resource-constrained environments. Recently, there are increasing efforts on optical neural networks and optical computing based DNNs hardware, which bring significant advantages for deep learning systems in terms of their power efficiency, parallelism and computational speed. Among them, free-space diffractive deep neural networks (D$^2$NNs) based on the light diffraction, feature millions of neurons in each layer interconnected with neurons in neighboring layers. However, due to the challenge of implementing reconfigurability, deploying different DNNs algorithms requires re-building and duplicating the physical diffractive systems, which significantly degrades the hardware efficiency in practical application scenarios. Thus, this work proposes a novel hardware-software co-design method that enables robust and noise-resilient Multi-task Learning in D$^2$2NNs. Our experimental results demonstrate significant improvements in versatility and hardware efficiency, and also demonstrate the robustness of proposed multi-task D$^2$NN architecture under wide noise ranges of all system components. In addition, we propose a domain-specific regularization algorithm for training the proposed multi-task architecture, which can be used to flexibly adjust the desired performance for each task.

* 5 figures, 1 table 
Viaarxiv icon

Contrastive Weight Regularization for Large Minibatch SGD

Nov 17, 2020
Qiwei Yuan, Weizhe Hua, Yi Zhou, Cunxi Yu

Figure 1 for Contrastive Weight Regularization for Large Minibatch SGD
Figure 2 for Contrastive Weight Regularization for Large Minibatch SGD
Figure 3 for Contrastive Weight Regularization for Large Minibatch SGD
Figure 4 for Contrastive Weight Regularization for Large Minibatch SGD

The minibatch stochastic gradient descent method (SGD) is widely applied in deep learning due to its efficiency and scalability that enable training deep networks with a large volume of data. Particularly in the distributed setting, SGD is usually applied with large batch size. However, as opposed to small-batch SGD, neural network models trained with large-batch SGD can hardly generalize well, i.e., the validation accuracy is low. In this work, we introduce a novel regularization technique, namely distinctive regularization (DReg), which replicates a certain layer of the deep network and encourages the parameters of both layers to be diverse. The DReg technique introduces very little computation overhead. Moreover, we empirically show that optimizing the neural network with DReg using large-batch SGD achieves a significant boost in the convergence and improved generalization performance. We also demonstrate that DReg can boost the convergence of large-batch SGD with momentum. We believe that DReg can be used as a simple regularization trick to accelerate large-batch training in deep learning.

Viaarxiv icon

Painting on Placement: Forecasting Routing Congestion using Conditional Generative Adversarial Nets

Apr 15, 2019
Cunxi Yu, Zhiru Zhang

Figure 1 for Painting on Placement: Forecasting Routing Congestion using Conditional Generative Adversarial Nets
Figure 2 for Painting on Placement: Forecasting Routing Congestion using Conditional Generative Adversarial Nets
Figure 3 for Painting on Placement: Forecasting Routing Congestion using Conditional Generative Adversarial Nets
Figure 4 for Painting on Placement: Forecasting Routing Congestion using Conditional Generative Adversarial Nets

Physical design process commonly consumes hours to days for large designs, and routing is known as the most critical step. Demands for accurate routing quality prediction raise to a new level to accelerate hardware innovation with advanced technology nodes. This work presents an approach that forecasts the density of all routing channels over the entire floorplan, with features collected up to placement, using conditional GANs. Specifically, forecasting the routing congestion is constructed as an image translation (colorization) problem. The proposed approach is applied to a) placement exploration for minimum congestion, b) constrained placement exploration and c) forecasting congestion in real-time during incremental placement, using eight designs targeting a fixed FPGA architecture.

* 6 pages, 9 figures, to appear at DAC'19 
Viaarxiv icon