Abstract:Machine unlearning, driven by privacy regulations and the "right to be forgotten", is increasingly needed at the edge, yet server-centric or retraining-heavy methods are impractical under tight computation and energy budgets. We present FiCABU (Fisher-based Context-Adaptive Balanced Unlearning), a software-hardware co-design that brings unlearning to edge AI processors. FiCABU combines (i) Context-Adaptive Unlearning, which begins edits from back-end layers and halts once the target forgetting is reached, with (ii) Balanced Dampening, which scales dampening strength by depth to preserve retain accuracy. These methods are realized in a full RTL design of a RISC-V edge AI processor that integrates two lightweight IPs for Fisher estimation and dampening into a GEMM-centric streaming pipeline, validated on an FPGA prototype and synthesized in 45 nm for power analysis. Across CIFAR-20 and PinsFaceRecognition with ResNet-18 and ViT, FiCABU achieves random-guess forget accuracy while matching the retraining-free Selective Synaptic Dampening (SSD) baseline on retain accuracy, reducing computation by up to 87.52 percent (ResNet-18) and 71.03 percent (ViT). On the INT8 hardware prototype, FiCABU further improves retain preservation and reduces energy to 6.48 percent (CIFAR-20) and 0.13 percent (PinsFaceRecognition) of the SSD baseline. In sum, FiCABU demonstrates that back-end-first, depth-aware unlearning can be made both practical and efficient for resource-constrained edge AI devices.
Abstract:Multi-channel keyword spotting (KWS) has become crucial for voice-based applications in edge environments. However, its substantial computational and energy requirements pose significant challenges. We introduce ASAP-FE (Agile Sparsity-Aware Parallelized-Feature Extractor), a hardware-oriented front-end designed to address these challenges. Our framework incorporates three key innovations: (1) Half-overlapped Infinite Impulse Response (IIR) Framing: This reduces redundant data by approximately 25% while maintaining essential phoneme transition cues. (2) Sparsity-aware Data Reduction: We exploit frame-level sparsity to achieve an additional 50% data reduction by combining frame skipping with stride-based filtering. (3) Dynamic Parallel Processing: We introduce a parameterizable filter cluster and a priority-based scheduling algorithm that allows parallel execution of IIR filtering tasks, reducing latency and optimizing energy efficiency. ASAP-FE is implemented with various filter cluster sizes on edge processors, with functionality verified on FPGA prototypes and designs synthesized at 45 nm. Experimental results using TC-ResNet8, DS-CNN, and KWT-1 demonstrate that ASAP-FE reduces the average workload by 62.73% while supporting real-time processing for up to 32 channels. Compared to a conventional fully overlapped baseline, ASAP-FE achieves less than a 1% accuracy drop (e.g., 96.22% vs. 97.13% for DS-CNN), which is well within acceptable limits for edge AI. By adjusting the number of filter modules, our design optimizes the trade-off between performance and energy, with 15 parallel filters providing optimal performance for up to 25 channels. Overall, ASAP-FE offers a practical and efficient solution for multi-channel KWS on energy-constrained edge devices.