Alert button
Picture for Bei Yu

Bei Yu

Alert button

TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices

Nov 03, 2023
Jianlei Yang, Jiacheng Liao, Fanding Lei, Meichen Liu, Junyi Chen, Lingkun Long, Han Wan, Bei Yu, Weisheng Zhao

Developing deep learning models on tiny devices (e.g. Microcontroller units, MCUs) has attracted much attention in various embedded IoT applications. However, it is challenging to efficiently design and deploy recent advanced models (e.g. transformers) on tiny devices due to their severe hardware resource constraints. In this work, we propose TinyFormer, a framework specifically designed to develop and deploy resource-efficient transformers on MCUs. TinyFormer mainly consists of SuperNAS, SparseNAS and SparseEngine. Separately, SuperNAS aims to search for an appropriate supernet from a vast search space. SparseNAS evaluates the best sparse single-path model including transformer architecture from the identified supernet. Finally, SparseEngine efficiently deploys the searched sparse models onto MCUs. To the best of our knowledge, SparseEngine is the first deployment framework capable of performing inference of sparse models with transformer on MCUs. Evaluation results on the CIFAR-10 dataset demonstrate that TinyFormer can develop efficient transformers with an accuracy of $96.1\%$ while adhering to hardware constraints of $1$MB storage and $320$KB memory. Additionally, TinyFormer achieves significant speedups in sparse inference, up to $12.2\times$, when compared to the CMSIS-NN library. TinyFormer is believed to bring powerful transformers into TinyML scenarios and greatly expand the scope of deep learning applications.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 
Viaarxiv icon

On the Evaluation of Generative Models in Distributed Learning Tasks

Oct 18, 2023
Zixiao Wang, Farzan Farnia, Zhenghao Lin, Yunheng Shen, Bei Yu

Figure 1 for On the Evaluation of Generative Models in Distributed Learning Tasks
Figure 2 for On the Evaluation of Generative Models in Distributed Learning Tasks
Figure 3 for On the Evaluation of Generative Models in Distributed Learning Tasks
Figure 4 for On the Evaluation of Generative Models in Distributed Learning Tasks

The evaluation of deep generative models including generative adversarial networks (GANs) and diffusion models has been extensively studied in the literature. While the existing evaluation methods mainly target a centralized learning problem with training data stored by a single client, many applications of generative models concern distributed learning settings, e.g. the federated learning scenario, where training data are collected by and distributed among several clients. In this paper, we study the evaluation of generative models in distributed learning tasks with heterogeneous data distributions. First, we focus on the Fr\'echet inception distance (FID) and consider the following FID-based aggregate scores over the clients: 1) FID-avg as the mean of clients' individual FID scores, 2) FID-all as the FID distance of the trained model to the collective dataset containing all clients' data. We prove that the model rankings according to the FID-all and FID-avg scores could be inconsistent, which can lead to different optimal generative models according to the two aggregate scores. Next, we consider the kernel inception distance (KID) and similarly define the KID-avg and KID-all aggregations. Unlike the FID case, we prove that KID-all and KID-avg result in the same rankings of generative models. We perform several numerical experiments on standard image datasets and training schemes to support our theoretical findings on the evaluation of generative models in distributed learning problems.

* 17 pages, 10 figures 
Viaarxiv icon

ChatEDA: A Large Language Model Powered Autonomous Agent for EDA

Aug 20, 2023
Zhuolun He, Haoyuan Wu, Xinyun Zhang, Xufeng Yao, Su Zheng, Haisheng Zheng, Bei Yu

Figure 1 for ChatEDA: A Large Language Model Powered Autonomous Agent for EDA
Figure 2 for ChatEDA: A Large Language Model Powered Autonomous Agent for EDA
Figure 3 for ChatEDA: A Large Language Model Powered Autonomous Agent for EDA
Figure 4 for ChatEDA: A Large Language Model Powered Autonomous Agent for EDA

The integration of a complex set of Electronic Design Automation (EDA) tools to enhance interoperability is a critical concern for circuit designers. Recent advancements in large language models (LLMs) have showcased their exceptional capabilities in natural language processing and comprehension, offering a novel approach to interfacing with EDA tools. This research paper introduces ChatEDA, an autonomous agent for EDA empowered by a large language model, AutoMage, complemented by EDA tools serving as executors. ChatEDA streamlines the design flow from the Register-Transfer Level (RTL) to the Graphic Data System Version II (GDSII) by effectively managing task planning, script generation, and task execution. Through comprehensive experimental evaluations, ChatEDA has demonstrated its proficiency in handling diverse requirements, and our fine-tuned AutoMage model has exhibited superior performance compared to GPT-4 and other similar LLMs.

Viaarxiv icon

Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks

May 23, 2023
Peng Xu, Lin Zhang, Xuanzhou Liu, Jiaqi Sun, Yue Zhao, Haiqing Yang, Bei Yu

Figure 1 for Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks
Figure 2 for Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks
Figure 3 for Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks
Figure 4 for Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks

Neural architecture search (NAS) for Graph neural networks (GNNs), called NAS-GNNs, has achieved significant performance over manually designed GNN architectures. However, these methods inherit issues from the conventional NAS methods, such as high computational cost and optimization difficulty. More importantly, previous NAS methods have ignored the uniqueness of GNNs, where GNNs possess expressive power without training. With the randomly-initialized weights, we can then seek the optimal architecture parameters via the sparse coding objective and derive a novel NAS-GNNs method, namely neural architecture coding (NAC). Consequently, our NAC holds a no-update scheme on GNNs and can efficiently compute in linear time. Empirical evaluations on multiple GNN benchmark datasets demonstrate that our approach leads to state-of-the-art performance, which is up to $200\times$ faster and $18.8\%$ more accurate than the strong baselines.

Viaarxiv icon

Decoupled Kullback-Leibler Divergence Loss

May 23, 2023
Jiequan Cui, Zhuotao Tian, Zhisheng Zhong, Xiaojuan Qi, Bei Yu, Hanwang Zhang

Figure 1 for Decoupled Kullback-Leibler Divergence Loss
Figure 2 for Decoupled Kullback-Leibler Divergence Loss
Figure 3 for Decoupled Kullback-Leibler Divergence Loss
Figure 4 for Decoupled Kullback-Leibler Divergence Loss

In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and observe that it is equivalent to the Doupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error (wMSE) loss and 2) a Cross-Entropy loss incorporating soft labels. From our analysis of the DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of DKL in scenarios like knowledge distillation by breaking its asymmetry property in training optimization. This modification ensures that the wMSE component is always effective during training, providing extra constructive cues. Secondly, we introduce global information into DKL for intra-class consistency regularization. With these two enhancements, we derive the Improved Kullback-Leibler (IKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100 and ImageNet datasets, focusing on adversarial training and knowledge distillation tasks. The proposed approach achieves new state-of-the-art performance on both tasks, demonstrating the substantial practical merits. Code and models will be available soon at https://github.com/jiequancui/DKL.

* under review 
Viaarxiv icon

Towards Versatile and Efficient Visual Knowledge Injection into Pre-trained Language Models with Cross-Modal Adapters

May 12, 2023
Xinyun Zhang, Haochen Tan, Han Wu, Mingjie Zhan, Ding Liang, Bei Yu

Figure 1 for Towards Versatile and Efficient Visual Knowledge Injection into Pre-trained Language Models with Cross-Modal Adapters
Figure 2 for Towards Versatile and Efficient Visual Knowledge Injection into Pre-trained Language Models with Cross-Modal Adapters
Figure 3 for Towards Versatile and Efficient Visual Knowledge Injection into Pre-trained Language Models with Cross-Modal Adapters
Figure 4 for Towards Versatile and Efficient Visual Knowledge Injection into Pre-trained Language Models with Cross-Modal Adapters

Humans learn language via multi-modal knowledge. However, due to the text-only pre-training scheme, most existing pre-trained language models (PLMs) are hindered from the multi-modal information. To inject visual knowledge into PLMs, existing methods incorporate either the text or image encoder of vision-language models (VLMs) to encode the visual information and update all the original parameters of PLMs for knowledge fusion. In this paper, we propose a new plug-and-play module, X-adapter, to flexibly leverage the aligned visual and textual knowledge learned in pre-trained VLMs and efficiently inject them into PLMs. Specifically, we insert X-adapters into PLMs, and only the added parameters are updated during adaptation. To fully exploit the potential in VLMs, X-adapters consist of two sub-modules, V-expert and T-expert, to fuse VLMs' image and text representations, respectively. We can opt for activating different sub-modules depending on the downstream tasks. Experimental results show that our method can significantly improve the performance on object-color reasoning and natural language understanding (NLU) tasks compared with PLM baselines.

Viaarxiv icon

Physics-Informed Optical Kernel Regression Using Complex-valued Neural Fields

Apr 02, 2023
Guojin Chen, Zehua Pei, Haoyu Yang, Yuzhe Ma, Bei Yu, Martin D. F. Wong

Figure 1 for Physics-Informed Optical Kernel Regression Using Complex-valued Neural Fields
Figure 2 for Physics-Informed Optical Kernel Regression Using Complex-valued Neural Fields
Figure 3 for Physics-Informed Optical Kernel Regression Using Complex-valued Neural Fields
Figure 4 for Physics-Informed Optical Kernel Regression Using Complex-valued Neural Fields

Lithography is fundamental to integrated circuit fabrication, necessitating large computation overhead. The advancement of machine learning (ML)-based lithography models alleviates the trade-offs between manufacturing process expense and capability. However, all previous methods regard the lithography system as an image-to-image black box mapping, utilizing network parameters to learn by rote mappings from massive mask-to-aerial or mask-to-resist image pairs, resulting in poor generalization capability. In this paper, we propose a new ML-based paradigm disassembling the rigorous lithographic model into non-parametric mask operations and learned optical kernels containing determinant source, pupil, and lithography information. By optimizing complex-valued neural fields to perform optical kernel regression from coordinates, our method can accurately restore lithography system using a small-scale training dataset with fewer parameters, demonstrating superior generalization capability as well. Experiments show that our framework can use 31% of parameters while achieving 69$\times$ smaller mean squared error with 1.3$\times$ higher throughput than the state-of-the-art.

* Accepted by DAC23 
Viaarxiv icon

GPU-accelerated Matrix Cover Algorithm for Multiple Patterning Layout Decomposition

Mar 25, 2023
Guojin Chen, Haoyu Yang, Bei Yu

Figure 1 for GPU-accelerated Matrix Cover Algorithm for Multiple Patterning Layout Decomposition
Figure 2 for GPU-accelerated Matrix Cover Algorithm for Multiple Patterning Layout Decomposition
Figure 3 for GPU-accelerated Matrix Cover Algorithm for Multiple Patterning Layout Decomposition
Figure 4 for GPU-accelerated Matrix Cover Algorithm for Multiple Patterning Layout Decomposition

Multiple patterning lithography (MPL) is regarded as one of the most promising ways of overcoming the resolution limitations of conventional optical lithography due to the delay of next-generation lithography technology. As the feature size continues to decrease, layout decomposition for multiple patterning lithography (MPLD) technology is becoming increasingly crucial for improving the manufacturability in advanced nodes. The decomposition process refers to assigning the layout features to different mask layers according to the design rules and density requirements. When the number of masks $k \geq 3$, the MPLD problems are NP-hard and thus may suffer from runtime overhead for practical designs. However, the number of layout patterns is increasing exponentially in industrial layouts, which hinders the runtime performance of MPLD models. In this research, we substitute the CPU's dance link data structure with parallel GPU matrix operations to accelerate the solution for exact cover-based MPLD algorithms. Experimental results demonstrate that our system is capable of full-scale, lightning-fast layout decomposition, which can achieve more than 10$\times$ speed-up without quality degradation compared to state-of-the-art layout decomposition methods.

* Accepted by SPIE23 
Viaarxiv icon

DiffPattern: Layout Pattern Generation via Discrete Diffusion

Mar 23, 2023
Zixiao Wang, Yunheng Shen, Wenqian Zhao, Yang Bai, Guojin Chen, Farzan Farnia, Bei Yu

Figure 1 for DiffPattern: Layout Pattern Generation via Discrete Diffusion
Figure 2 for DiffPattern: Layout Pattern Generation via Discrete Diffusion
Figure 3 for DiffPattern: Layout Pattern Generation via Discrete Diffusion
Figure 4 for DiffPattern: Layout Pattern Generation via Discrete Diffusion

Deep generative models dominate the existing literature in layout pattern generation. However, leaving the guarantee of legality to an inexplicable neural network could be problematic in several applications. In this paper, we propose \tool{DiffPattern} to generate reliable layout patterns. \tool{DiffPattern} introduces a novel diverse topology generation method via a discrete diffusion model with compute-efficiently lossless layout pattern representation. Then a white-box pattern assessment is utilized to generate legal patterns given desired design rules. Our experiments on several benchmark settings show that \tool{DiffPattern} significantly outperforms existing baselines and is capable of synthesizing reliable layout patterns.

* DAC2023 Accepted 
Viaarxiv icon