Alert button
Picture for Mingjie Liu

Mingjie Liu

Alert button

ChipNeMo: Domain-Adapted LLMs for Chip Design

Nov 13, 2023
Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, Bonita Bhaskaran, Bryan Catanzaro, Arjun Chaudhuri, Sharon Clay, Bill Dally, Laura Dang, Parikshit Deshpande, Siddhanth Dhodhi, Sameer Halepete, Eric Hill, Jiashang Hu, Sumit Jain, Brucek Khailany, Kishor Kunal, Xiaowei Li, Hao Liu, Stuart Oberman, Sujeet Omar, Sreedhar Pratty, Jonathan Raiman, Ambar Sarkar, Zhengjiang Shao, Hanfei Sun, Pratik P Suthar, Varun Tej, Kaizhe Xu, Haoxing Ren

ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: custom tokenizers, domain-adaptive continued pretraining, supervised fine-tuning (SFT) with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: an engineering assistant chatbot, EDA script generation, and bug summarization and analysis. Our results show that these domain adaptation techniques enable significant LLM performance improvements over general-purpose base models across the three evaluated applications, enabling up to 5x model size reduction with similar or better performance on a range of design tasks. Our findings also indicate that there's still room for improvement between our current results and ideal outcomes. We believe that further investigation of domain-adapted LLM approaches will help close this gap in the future.

Viaarxiv icon

VerilogEval: Evaluating Large Language Models for Verilog Code Generation

Sep 14, 2023
Mingjie Liu, Nathaniel Pinckney, Brucek Khailany, Haoxing Ren

Figure 1 for VerilogEval: Evaluating Large Language Models for Verilog Code Generation
Figure 2 for VerilogEval: Evaluating Large Language Models for Verilog Code Generation
Figure 3 for VerilogEval: Evaluating Large Language Models for Verilog Code Generation
Figure 4 for VerilogEval: Evaluating Large Language Models for Verilog Code Generation

The increasing popularity of large language models (LLMs) has paved the way for their application in diverse domains. This paper proposes a benchmarking framework tailored specifically for evaluating LLM performance in the context of Verilog code generation for hardware design and verification. We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits. The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines. The Verilog code completions can be automatically tested for functional correctness by comparing the transient simulation outputs of the generated design with a golden solution. We also demonstrate that the Verilog code generation capability of pretrained language models could be improved with supervised fine-tuning by bootstrapping with LLM generated synthetic problem-code pairs.

* ICCAD 2023 Invited Paper 
Viaarxiv icon

An Adversarial Active Sampling-based Data Augmentation Framework for Manufacturable Chip Design

Oct 27, 2022
Mingjie Liu, Haoyu Yang, Zongyi Li, Kumara Sastry, Saumyadip Mukhopadhyay, Selim Dogru, Anima Anandkumar, David Z. Pan, Brucek Khailany, Haoxing Ren

Figure 1 for An Adversarial Active Sampling-based Data Augmentation Framework for Manufacturable Chip Design
Figure 2 for An Adversarial Active Sampling-based Data Augmentation Framework for Manufacturable Chip Design
Figure 3 for An Adversarial Active Sampling-based Data Augmentation Framework for Manufacturable Chip Design
Figure 4 for An Adversarial Active Sampling-based Data Augmentation Framework for Manufacturable Chip Design

Lithography modeling is a crucial problem in chip design to ensure a chip design mask is manufacturable. It requires rigorous simulations of optical and chemical models that are computationally expensive. Recent developments in machine learning have provided alternative solutions in replacing the time-consuming lithography simulations with deep neural networks. However, the considerable accuracy drop still impedes its industrial adoption. Most importantly, the quality and quantity of the training dataset directly affect the model performance. To tackle this problem, we propose a litho-aware data augmentation (LADA) framework to resolve the dilemma of limited data and improve the machine learning model performance. First, we pretrain the neural networks for lithography modeling and a gradient-friendly StyleGAN2 generator. We then perform adversarial active sampling to generate informative and synthetic in-distribution mask designs. These synthetic mask images will augment the original limited training dataset used to finetune the lithography model for improved performance. Experimental results demonstrate that LADA can successfully exploits the neural network capacity by narrowing down the performance gap between the training and testing data instances.

Viaarxiv icon

Delving into Effective Gradient Matching for Dataset Condensation

Jul 30, 2022
Zixuan Jiang, Jiaqi Gu, Mingjie Liu, David Z. Pan

Figure 1 for Delving into Effective Gradient Matching for Dataset Condensation
Figure 2 for Delving into Effective Gradient Matching for Dataset Condensation
Figure 3 for Delving into Effective Gradient Matching for Dataset Condensation
Figure 4 for Delving into Effective Gradient Matching for Dataset Condensation

As deep learning models and datasets rapidly scale up, network training is extremely time-consuming and resource-costly. Instead of training on the entire dataset, learning with a small synthetic dataset becomes an efficient solution. Extensive research has been explored in the direction of dataset condensation, among which gradient matching achieves state-of-the-art performance. The gradient matching method directly targets the training dynamics by matching the gradient when training on the original and synthetic datasets. However, there are limited deep investigations into the principle and effectiveness of this method. In this work, we delve into the gradient matching method from a comprehensive perspective and answer the critical questions of what, how, and where to match. We propose to match the multi-level gradients to involve both intra-class and inter-class gradient information. We demonstrate that the distance function should focus on the angle, considering the magnitude simultaneously to delay the overfitting. An overfitting-aware adaptive learning step strategy is also proposed to trim unnecessary optimization steps for algorithmic efficiency improvement. Ablation and comparison experiments demonstrate that our proposed methodology shows superior accuracy, efficiency, and generalization compared to prior work.

* 12 pages 
Viaarxiv icon

RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL

Jul 13, 2022
Wei Shi, Hanrui Wang, Jiaqi Gu, Mingjie Liu, David Pan, Song Han, Nan Sun

Figure 1 for RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL
Figure 2 for RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL
Figure 3 for RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL
Figure 4 for RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL

Analog/mixed-signal circuit design is one of the most complex and time-consuming stages in the whole chip design process. Due to various process, voltage, and temperature (PVT) variations from chip manufacturing, analog circuits inevitably suffer from performance degradation. Although there has been plenty of work on automating analog circuit design under the typical condition, limited research has been done on exploring robust designs under real and unpredictable silicon variations. Automatic analog design against variations requires prohibitive computation and time costs. To address the challenge, we present RobustAnalog, a robust circuit design framework that involves the variation information in the optimization process. Specifically, circuit optimizations under different variations are considered as a set of tasks. Similarities among tasks are leveraged and competitions are alleviated to realize a sample-efficient multi-task training. Moreover, RobustAnalog prunes the task space according to the current performance in each iteration, leading to a further simulation cost reduction. In this way, RobustAnalog can rapidly produce a set of circuit parameters that satisfies diverse constraints (e.g. gain, bandwidth, noise...) across variations. We compare RobustAnalog with Bayesian optimization, Evolutionary algorithm, and Deep Deterministic Policy Gradient (DDPG) and demonstrate that RobustAnalog can significantly reduce required optimization time by 14-30 times. Therefore, our study provides a feasible method to handle various real silicon conditions.

Viaarxiv icon

ELight: Enabling Efficient Photonic In-Memory Neurocomputing with Life Enhancement

Dec 15, 2021
Hanqing Zhu, Jiaqi Gu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen, David Z. Pan

Figure 1 for ELight: Enabling Efficient Photonic In-Memory Neurocomputing with Life Enhancement
Figure 2 for ELight: Enabling Efficient Photonic In-Memory Neurocomputing with Life Enhancement
Figure 3 for ELight: Enabling Efficient Photonic In-Memory Neurocomputing with Life Enhancement
Figure 4 for ELight: Enabling Efficient Photonic In-Memory Neurocomputing with Life Enhancement

With the recent advances in optical phase change material (PCM), photonic in-memory neurocomputing has demonstrated its superiority in optical neural network (ONN) designs with near-zero static power consumption, time-of-light latency, and compact footprint. However, photonic tensor cores require massive hardware reuse to implement large matrix multiplication due to the limited single-core scale. The resultant large number of PCM writes leads to serious dynamic power and overwhelms the fragile PCM with limited write endurance. In this work, we propose a synergistic optimization framework, ELight, to minimize the overall write efforts for efficient and reliable optical in-memory neurocomputing. We first propose write-aware training to encourage the similarity among weight blocks, and combine it with a post-training optimization method to reduce programming efforts by eliminating redundant writes. Experiments show that ELight can achieve over 20X reduction in the total number of writes and dynamic power with comparable accuracy. With our ELight, photonic in-memory neurocomputing will step forward towards viable applications in machine learning with preserved accuracy, order-of-magnitude longer lifetime, and lower programming energy.

* 7 pages, 8 figures, accepted by ASPDAC 2022 
Viaarxiv icon

Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation

Sep 05, 2021
Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen, David Z. Pan

Figure 1 for Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation
Figure 2 for Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation
Figure 3 for Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation
Figure 4 for Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation

Deep neural networks (DNN) have shown superior performance in a variety of tasks. As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices. Though extensive efficient accelerator designs, from traditional electronics to emerging photonics, have been successfully demonstrated, they are still bottlenecked by expensive memory accesses due to tremendous gaps between the bandwidth/power/latency of electrical memory and computing cores. Previous solutions fail to fully-leverage the ultra-fast computational speed of emerging DNN accelerators to break through the critical memory bound. In this work, we propose a general and unified framework to trade expensive memory transactions with ultra-fast on-chip computations, directly translating to performance improvement. We are the first to jointly explore the intrinsic correlations and bit-level redundancy within DNN kernels and propose a multi-level in situ generation mechanism with mixed-precision bases to achieve on-the-fly recovery of high-resolution parameters with minimum hardware overhead. Extensive experiments demonstrate that our proposed joint method can boost the memory efficiency by 10-20x with comparable accuracy over four state-of-the-art designs, when benchmarked on ResNet-18/DenseNet-121/MobileNetV2/V3 with various tasks.

* Accepted by International Conference on Computer Vision (ICCV) 2021 
Viaarxiv icon

Optimizer Fusion: Efficient Training with Better Locality and Parallelism

Apr 01, 2021
Zixuan Jiang, Jiaqi Gu, Mingjie Liu, Keren Zhu, David Z. Pan

Figure 1 for Optimizer Fusion: Efficient Training with Better Locality and Parallelism
Figure 2 for Optimizer Fusion: Efficient Training with Better Locality and Parallelism
Figure 3 for Optimizer Fusion: Efficient Training with Better Locality and Parallelism
Figure 4 for Optimizer Fusion: Efficient Training with Better Locality and Parallelism

Machine learning frameworks adopt iterative optimizers to train neural networks. Conventional eager execution separates the updating of trainable parameters from forward and backward computations. However, this approach introduces nontrivial training time overhead due to the lack of data locality and computation parallelism. In this work, we propose to fuse the optimizer with forward or backward computation to better leverage locality and parallelism during training. By reordering the forward computation, gradient calculation, and parameter updating, our proposed method improves the efficiency of iterative optimizers. Experimental results demonstrate that we can achieve an up to 20% training time reduction on various configurations. Since our methods do not alter the optimizer algorithm, they can be used as a general "plug-in" technique to the training process.

* It is published as a paper at the Hardware Aware Efficient Training (HAET) workshop of ICLR 2021. There are 4 pages excluding references and appendices 
Viaarxiv icon