Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Husheng Han

AutoPPA: Automated Circuit PPA Optimization via Contrastive Code-based Rule Library Learning

Apr 20, 2026

Chongxiao Li, Pengwei Jin, Di Huang, Guangrun Sun, Husheng Han, Jianan Mu, Xinyao Zheng, Jiaguo Zhu, Shuyi Xing, Hanjun Wei(+7 more)

Abstract:Performance, power, and area (PPA) optimization is a fundamental task in RTL design, requiring a precise understanding of circuit functionality and the relationship between circuit structures and PPA metrics. Recent studies attempt to automate this process using LLMs, but neither feedback-based nor knowledge-based methods are efficient enough, as they either design without any prior knowledge or rely heavily on human-summarized optimization rules. In this paper, we propose AutoPPA, a fully automated PPA optimization framework. The key idea is to automatically generate optimization rules that enhance the search for optimal solutions. To do this, AutoPPA employs an Explore-Evaluate-Induce ($E^2I$) workflow that contrasts and abstracts rules from diverse generated code pairs rather than manually defined prior knowledge, yielding better optimization patterns. To make the abstracted rules more generalizable, AutoPPA employs an adaptive multi-step search framework that adopts the most effective rules for a given circuit. Experiments show that AutoPPA outperforms both the manual optimization and the state-of-the-art methods SymRTLO and RTLRewriter.

Via

Access Paper or Ask Questions

TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

Jul 12, 2024

Husheng Han, Xinyao Zheng, Yuanbo Wen, Yifan Hao, Erhu Feng, Ling Liang, Jianan Mu, Xiaqing Li, Tianyun Ma, Pengwei Jin(+4 more)

Figure 1 for TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

Figure 2 for TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

Figure 3 for TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

Figure 4 for TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

Abstract:Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.

* Accepted by ASPLOS 2024

Via

Access Paper or Ask Questions

Pushing the Limits of Machine Design: Automated CPU Design with AI

Jun 27, 2023

Shuyao Cheng, Pengwei Jin, Qi Guo, Zidong Du, Rui Zhang, Yunhao Tian, Xing Hu, Yongwei Zhao, Yifan Hao, Xiangtao Guan(+9 more)

Figure 1 for Pushing the Limits of Machine Design: Automated CPU Design with AI

Figure 2 for Pushing the Limits of Machine Design: Automated CPU Design with AI

Figure 3 for Pushing the Limits of Machine Design: Automated CPU Design with AI

Figure 4 for Pushing the Limits of Machine Design: Automated CPU Design with AI

Abstract:Design activity -- constructing an artifact description satisfying given goals and constraints -- distinguishes humanity from other animals and traditional machines, and endowing machines with design abilities at the human level or beyond has been a long-term pursuit. Though machines have already demonstrated their abilities in designing new materials, proteins, and computer programs with advanced artificial intelligence (AI) techniques, the search space for designing such objects is relatively small, and thus, "Can machines design like humans?" remains an open question. To explore the boundary of machine design, here we present a new AI approach to automatically design a central processing unit (CPU), the brain of a computer, and one of the world's most intricate devices humanity have ever designed. This approach generates the circuit logic, which is represented by a graph structure called Binary Speculation Diagram (BSD), of the CPU design from only external input-output observations instead of formal program code. During the generation of BSD, Monte Carlo-based expansion and the distance of Boolean functions are used to guarantee accuracy and efficiency, respectively. By efficiently exploring a search space of unprecedented size 10^{10^{540}}, which is the largest one of all machine-designed objects to our best knowledge, and thus pushing the limits of machine design, our approach generates an industrial-scale RISC-V CPU within only 5 hours. The taped-out CPU successfully runs the Linux operating system and performs comparably against the human-designed Intel 80486SX CPU. In addition to learning the world's first CPU only from input-output observations, which may reform the semiconductor industry by significantly reducing the design cycle, our approach even autonomously discovers human knowledge of the von Neumann architecture.

* 28 pages

Via

Access Paper or Ask Questions

Flew Over Learning Trap: Learn Unlearnable Samples by Progressive Staged Training

Jun 03, 2023

Pucheng Dang, Xing Hu, Kaidi Xu, Jinhao Duan, Di Huang, Husheng Han, Rui Zhang, Zidong Du, Qi Guo, Yunji Chen

Figure 1 for Flew Over Learning Trap: Learn Unlearnable Samples by Progressive Staged Training

Figure 2 for Flew Over Learning Trap: Learn Unlearnable Samples by Progressive Staged Training

Figure 3 for Flew Over Learning Trap: Learn Unlearnable Samples by Progressive Staged Training

Figure 4 for Flew Over Learning Trap: Learn Unlearnable Samples by Progressive Staged Training

Abstract:Unlearning techniques are proposed to prevent third parties from exploiting unauthorized data, which generate unlearnable samples by adding imperceptible perturbations to data for public publishing. These unlearnable samples effectively misguide model training to learn perturbation features but ignore image semantic features. We make the in-depth analysis and observe that models can learn both image features and perturbation features of unlearnable samples at an early stage, but rapidly go to the overfitting stage since the shallow layers tend to overfit on perturbation features and make models fall into overfitting quickly. Based on the observations, we propose Progressive Staged Training to effectively prevent models from overfitting in learning perturbation features. We evaluated our method on multiple model architectures over diverse datasets, e.g., CIFAR-10, CIFAR-100, and ImageNet-mini. Our method circumvents the unlearnability of all state-of-the-art methods in the literature and provides a reliable baseline for further evaluation of unlearnable techniques.

Via

Access Paper or Ask Questions

Real-Time Robust Video Object Detection System Against Physical-World Adversarial Attacks

Aug 19, 2022

Husheng Han, Xing Hu, Kaidi Xu, Pucheng Dang, Ying Wang, Yongwei Zhao, Zidong Du, Qi Guo, Yanzhi Yang, Tianshi Chen

Figure 1 for Real-Time Robust Video Object Detection System Against Physical-World Adversarial Attacks

Figure 2 for Real-Time Robust Video Object Detection System Against Physical-World Adversarial Attacks

Figure 3 for Real-Time Robust Video Object Detection System Against Physical-World Adversarial Attacks

Figure 4 for Real-Time Robust Video Object Detection System Against Physical-World Adversarial Attacks

Abstract:DNN-based video object detection (VOD) powers autonomous driving and video surveillance industries with rising importance and promising opportunities. However, adversarial patch attack yields huge concern in live vision tasks because of its practicality, feasibility, and powerful attack effectiveness. This work proposes Themis, a software/hardware system to defend against adversarial patches for real-time robust video object detection. We observe that adversarial patches exhibit extremely localized superficial feature importance in a small region with non-robust predictions, and thus propose the adversarial region detection algorithm for adversarial effect elimination. Themis also proposes a systematic design to efficiently support the algorithm by eliminating redundant computations and memory traffics. Experimental results show that the proposed methodology can effectively recover the system from the adversarial attack with negligible hardware overhead.

Via

Access Paper or Ask Questions

ScaleCert: Scalable Certified Defense against Adversarial Patches with Sparse Superficial Layers

Nov 04, 2021

Husheng Han, Kaidi Xu, Xing Hu, Xiaobing Chen, Ling Liang, Zidong Du, Qi Guo, Yanzhi Wang, Yunji Chen

Figure 1 for ScaleCert: Scalable Certified Defense against Adversarial Patches with Sparse Superficial Layers

Figure 2 for ScaleCert: Scalable Certified Defense against Adversarial Patches with Sparse Superficial Layers

Figure 3 for ScaleCert: Scalable Certified Defense against Adversarial Patches with Sparse Superficial Layers

Figure 4 for ScaleCert: Scalable Certified Defense against Adversarial Patches with Sparse Superficial Layers

Abstract:Adversarial patch attacks that craft the pixels in a confined region of the input images show their powerful attack effectiveness in physical environments even with noises or deformations. Existing certified defenses towards adversarial patch attacks work well on small images like MNIST and CIFAR-10 datasets, but achieve very poor certified accuracy on higher-resolution images like ImageNet. It is urgent to design both robust and effective defenses against such a practical and harmful attack in industry-level larger images. In this work, we propose the certified defense methodology that achieves high provable robustness for high-resolution images and largely improves the practicality for real adoption of the certified defense. The basic insight of our work is that the adversarial patch intends to leverage localized superficial important neurons (SIN) to manipulate the prediction results. Hence, we leverage the SIN-based DNN compression techniques to significantly improve the certified accuracy, by reducing the adversarial region searching overhead and filtering the prediction noises. Our experimental results show that the certified accuracy is increased from 36.3% (the state-of-the-art certified detection) to 60.4% on the ImageNet dataset, largely pushing the certified defenses for practical use.

* Accepted at NeurIPS 2021

Via

Access Paper or Ask Questions