Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Olivia Weng

UC San Diego

Design Rules for Extreme-Edge Scientific Computing on AI Engines

Apr 21, 2026

Zhenghua Ma, G Abarajithan, Dimitrios Danopoulos, Olivia Weng, Francesco Restuccia, Ryan Kastner

Abstract:Extreme-edge scientific applications use machine learning models to analyze sensor data and make real-time decisions. Their stringent latency and throughput requirements demand small batch sizes and require that model weights remain fully on-chip. Spatial dataflow implementations are common for extreme-edge applications. Spatial dataflow works well for small networks, but it fails to scale to larger models due to inherent resource scaling limitations. AI Engines on modern FPGA SoCs offer a promising alternative with high compute density and additional on-chip memory. However, the architecture, programming model, and performance-scaling behavior of AI Engines differ fundamentally from those of the programmable logic, making direct comparison non-trivial and the benefits of using AI Engines unclear. This work addresses how and when extreme-edge scientific neural networks should be implemented on AI Engines versus programmable logic. We provide systematic architectural characterization and micro-benchmarking and introduce a latency-adjusted resource equivalence (LARE) metric that identifies when AI Engine implementations outperform programmable logic designs. We further propose spatial and API-level dataflow optimizations tailored to low-latency scientific inference. Finally, we demonstrate the successful deployment of end-to-end neural networks on AI Engines that cannot fit on programmable logic when using the hlsml toolchain.

Via

Access Paper or Ask Questions

Loss Landscape Analysis for Reliable Quantized ML Models for Scientific Sensing

Feb 12, 2025

Tommaso Baldi, Javier Campos, Olivia Weng, Caleb Geniesse, Nhan Tran, Ryan Kastner, Alessandro Biondi

Abstract:In this paper, we propose a method to perform empirical analysis of the loss landscape of machine learning (ML) models. The method is applied to two ML models for scientific sensing, which necessitates quantization to be deployed and are subject to noise and perturbations due to experimental conditions. Our method allows assessing the robustness of ML models to such effects as a function of quantization precision and under different regularization techniques -- two crucial concerns that remained underexplored so far. By investigating the interplay between performance, efficiency, and robustness by means of loss landscape analysis, we both established a strong correlation between gently-shaped landscapes and robustness to input and weight perturbations and observed other intriguing and non-obvious phenomena. Our method allows a systematic exploration of such trade-offs a priori, i.e., without training and testing multiple models, leading to more efficient development workflows. This work also highlights the importance of incorporating robustness into the Pareto optimization of ML models, enabling more reliable and adaptive scientific sensing systems.

* Under review

Via

Access Paper or Ask Questions

Reliable edge machine learning hardware for scientific applications

Jun 27, 2024

Tommaso Baldi, Javier Campos, Ben Hawks, Jennifer Ngadiuba, Nhan Tran, Daniel Diaz, Javier Duarte, Ryan Kastner, Andres Meza, Melissa Quinnan(+8 more)

Figure 1 for Reliable edge machine learning hardware for scientific applications

Figure 2 for Reliable edge machine learning hardware for scientific applications

Figure 3 for Reliable edge machine learning hardware for scientific applications

Figure 4 for Reliable edge machine learning hardware for scientific applications

Abstract:Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing. This leads to unique validation challenges for VLSI implementations of ML algorithms: enabling bit-accurate functional simulations for performance validation in experimental software frameworks, verifying those ML models are robust under extreme quantization and pruning, and enabling ultra-fine-grained model inspection for efficient fault tolerance. We discuss approaches to developing and validating reliable algorithms at the scientific edge under such strict latency, resource, power, and area requirements in extreme experimental environments. We study metrics for developing robust algorithms, present preliminary results and mitigation strategies, and conclude with an outlook of these and future directions of research towards the longer-term goal of developing autonomous scientific experimentation methods for accelerated scientific discovery.

* IEEE VLSI Test Symposium 2024 (VTS)

Via

Access Paper or Ask Questions

Architectural Implications of Neural Network Inference for High Data-Rate, Low-Latency Scientific Applications

Mar 13, 2024

Olivia Weng, Alexander Redding, Nhan Tran, Javier Mauricio Duarte, Ryan Kastner

Figure 1 for Architectural Implications of Neural Network Inference for High Data-Rate, Low-Latency Scientific Applications

Figure 2 for Architectural Implications of Neural Network Inference for High Data-Rate, Low-Latency Scientific Applications

Figure 3 for Architectural Implications of Neural Network Inference for High Data-Rate, Low-Latency Scientific Applications

Abstract:With more scientific fields relying on neural networks (NNs) to process data incoming at extreme throughputs and latencies, it is crucial to develop NNs with all their parameters stored on-chip. In many of these applications, there is not enough time to go off-chip and retrieve weights. Even more so, off-chip memory such as DRAM does not have the bandwidth required to process these NNs as fast as the data is being produced (e.g., every 25 ns). As such, these extreme latency and bandwidth requirements have architectural implications for the hardware intended to run these NNs: 1) all NN parameters must fit on-chip, and 2) codesigning custom/reconfigurable logic is often required to meet these latency and bandwidth constraints. In our work, we show that many scientific NN applications must run fully on chip, in the extreme case requiring a custom chip to meet such stringent constraints.

Via

Access Paper or Ask Questions

Tailor: Altering Skip Connections for Resource-Efficient Inference

Jan 18, 2023

Olivia Weng, Gabriel Marcano, Vladimir Loncar, Alireza Khodamoradi, Nojan Sheybani, Farinaz Koushanfar, Kristof Denolf, Javier Mauricio Duarte, Ryan Kastner

Figure 1 for Tailor: Altering Skip Connections for Resource-Efficient Inference

Figure 2 for Tailor: Altering Skip Connections for Resource-Efficient Inference

Figure 3 for Tailor: Altering Skip Connections for Resource-Efficient Inference

Figure 4 for Tailor: Altering Skip Connections for Resource-Efficient Inference

Abstract:Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this paper, we show that skip connections can be optimized for hardware when tackled with a hardware-software codesign approach. We argue that while a network's skip connections are needed for the network to learn, they can later be removed or shortened to provide a more hardware efficient implementation with minimal to no accuracy loss. We introduce Tailor, a codesign tool whose hardware-aware training algorithm gradually removes or shortens a fully trained network's skip connections to lower their hardware cost. The optimized hardware designs improve resource utilization by up to 34% for BRAMs, 13% for FFs, and 16% for LUTs.

* 12 pages, 11 figures

Via

Access Paper or Ask Questions

Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

Jun 23, 2022

Hendrik Borras, Giuseppe Di Guglielmo, Javier Duarte, Nicolò Ghielmetti, Ben Hawks, Scott Hauck, Shih-Chieh Hsu, Ryan Kastner, Jason Liang, Andres Meza(+8 more)

Figure 1 for Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

Figure 2 for Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

Figure 3 for Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

Figure 4 for Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

Abstract:We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classification benchmark tasks. The resulting hardware implementations are quantized, configurable, spatial dataflow architectures tailored for speed and efficiency and introduce new generic optimizations and common workflows developed as a part of this work. The full workflow is presented from quantization-aware training to FPGA implementation. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms. The resulting submissions achieve latencies as low as 20 $\mu$s and energy consumption as low as 30 $\mu$J per inference. We demonstrate how emerging ML benchmarks on heterogeneous hardware platforms can catalyze collaboration and the development of new techniques and more accessible tools.

* 15 pages, 7 figures, Contribution to 3rd Workshop on Benchmarking Machine Learning Workloads on Emerging Hardware (MLBench) at 5th Conference on Machine Learning and Systems (MLSys)

Via

Access Paper or Ask Questions

Neural Network Quantization for Efficient Inference: A Survey

Dec 08, 2021

Olivia Weng

Figure 1 for Neural Network Quantization for Efficient Inference: A Survey

Figure 2 for Neural Network Quantization for Efficient Inference: A Survey

Figure 3 for Neural Network Quantization for Efficient Inference: A Survey

Figure 4 for Neural Network Quantization for Efficient Inference: A Survey

Abstract:As neural networks have become more powerful, there has been a rising desire to deploy them in the real world; however, the power and accuracy of neural networks is largely due to their depth and complexity, making them difficult to deploy, especially in resource-constrained devices. Neural network quantization has recently arisen to meet this demand of reducing the size and complexity of neural networks by reducing the precision of a network. With smaller and simpler networks, it becomes possible to run neural networks within the constraints of their target hardware. This paper surveys the many neural network quantization techniques that have been developed in the last decade. Based on this survey and comparison of neural network quantization techniques, we propose future directions of research in the area.

* 13 pages

Via

Access Paper or Ask Questions

Applications and Techniques for Fast Machine Learning in Science

Oct 25, 2021

Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer(+77 more)

Figure 1 for Applications and Techniques for Fast Machine Learning in Science

Figure 2 for Applications and Techniques for Fast Machine Learning in Science

Figure 3 for Applications and Techniques for Fast Machine Learning in Science

Figure 4 for Applications and Techniques for Fast Machine Learning in Science

Abstract:In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.

* 66 pages, 13 figures, 5 tables

Via

Access Paper or Ask Questions

Hardware-efficient Residual Networks for FPGAs

Feb 02, 2021

Olivia Weng, Alireza Khodamoradi, Ryan Kastner

Figure 1 for Hardware-efficient Residual Networks for FPGAs

Figure 2 for Hardware-efficient Residual Networks for FPGAs

Figure 3 for Hardware-efficient Residual Networks for FPGAs

Abstract:Residual networks (ResNets) employ skip connections in their networks -- reusing activations from previous layers -- to improve training convergence, but these skip connections create challenges for hardware implementations of ResNets. The hardware must either wait for skip connections to be processed before processing more incoming data or buffer them elsewhere. Without skip connections, ResNets would be more hardware-efficient. Thus, we present the teacher-student learning method to gradually prune away all of a ResNet's skip connections, constructing a network we call NonResNet. We show that when implemented for FPGAs, NonResNet decreases ResNet's BRAM utilization by 9% and LUT utilization by 3% and increases throughput by 5%.

* Presented at DATE Friday Workshop on System-level Design Methods for Deep Learning on Heterogeneous Architectures (SLOHA 2021) (arXiv:2102.00818)

Via

Access Paper or Ask Questions