Machine unlearning strives to uphold the data owners' right to be forgotten by enabling models to selectively forget specific data. Recent methods suggest that one approach of data forgetting is by precomputing and storing statistics carrying second-order information to improve computational and memory efficiency. However, they rely on restrictive assumptions and the computation/storage suffer from the curse of model parameter dimensionality, making it challenging to apply to most deep neural networks. In this work, we propose a Hessian-free online unlearning method. We propose to maintain a statistical vector for each data point, computed through affine stochastic recursion approximation of the difference between retrained and learned models. Our proposed algorithm achieves near-instantaneous online unlearning as it only requires a vector addition operation. Based on the strategy that recollecting statistics for forgetting data, the proposed method significantly reduces the unlearning runtime. Experimental studies demonstrate that the proposed scheme surpasses existing results by orders of magnitude in terms of time and memory costs, while also enhancing accuracy.
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs, often riddled with numerous challenges such as frequent hardware failures, intricate parallelization strategies, and imbalanced resource utilization. In this paper, we present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme. Specifically, we investigate discrepancies between LLMs and prior task-specific Deep Learning (DL) workloads, explore resource utilization patterns, and identify the impact of various job failures. Our analysis summarizes hurdles we encountered and uncovers potential opportunities to optimize systems tailored for LLMs. Furthermore, we introduce our system efforts: (1) fault-tolerant pretraining, which enhances fault tolerance through LLM-involved failure diagnosis and automatic recovery. (2) decoupled scheduling for evaluation, which achieves timely performance feedback via trial decomposition and scheduling optimization.
Load frequency control (LFC) is widely employed in power systems to stabilize frequency fluctuation and guarantee power quality. However, most existing LFC methods rely on accurate power system modeling and usually ignore the nonlinear characteristics of the system, limiting controllers' performance. To solve these problems, this paper proposes a model-free LFC method for nonlinear power systems based on deep deterministic policy gradient (DDPG) framework. The proposed method establishes an emulator network to emulate power system dynamics. After defining the action-value function, the emulator network is applied for control actions evaluation instead of the critic network. Then the actor network controller is effectively optimized by estimating the policy gradient based on zeroth-order optimization (ZOO) and backpropagation algorithm. Simulation results and corresponding comparisons demonstrate the designed controller can generate appropriate control actions and has strong adaptability for nonlinear power systems.
On-demand controlled drug delivery is essential for the treatment of a wide range of chronic diseases. As the drug is released at the time when required, its efficacy is boosted and the side effects are minimized. However, so far, drug delivery devices often rely on the passive diffusion process for a sustained release, which is slow and uncontrollable. Here, we present a miniaturized microfluidic device for wirelessly controlled ultrafast active drug delivery, driven by an oscillating solid-liquid interface. The oscillation generates acoustic streaming in the drug reservoir, which opens an elastic valve to deliver the drug. High-speed microscopy reveals the fast response of the valve on the order of 1 ms, which is more than three orders of magnitude faster than the start-of-the-art. The amount of the released drug exhibits a linear relationship with the working time and the electric power applied to the ultrasonic resonator. The trigger of the release is wirelessly controlled via a magnetic field, and the system shows stable output in a continuous experiment for two weeks. The integrated system shows great promise as a long-term controlled drug delivery implant for chronic diseases.
Electronic-photonic computing systems offer immense potential in energy-efficient artificial intelligence (AI) acceleration tasks due to the superior computing speed and efficiency of optics, especially for real-time, low-energy deep neural network (DNN) inference tasks on resource-restricted edge platforms. However, current optical neural accelerators based on foundry-available devices and conventional system architecture still encounter a performance gap compared to highly customized electronic counterparts. To bridge the performance gap due to lack of domain specialization, we present a time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with cross-layer device/circuit/architecture customization. At the device level, we present foundry-compatible, customized photonic devices, including a slow-light electro-optic modulator with experimental demonstration, optical splitters, and phase shifters that significantly reduce the footprint and power in input encoding and dot-product calculation. At the circuit level, partial products are hierarchically accumulated via parallel photocurrent aggregation, lightweight capacitive temporal integration, and sequential digital summation, considerably relieving the analog-to-digital conversion bottleneck. We also employ a multi-tile, multi-core architecture to maximize hardware sharing for higher efficiency. Across diverse edge AI workloads, TeMPO delivers digital-comparable task accuracy with superior quantization/noise tolerance. We achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2 TOPS/mm$^2$ compute density, pushing the Pareto frontier in edge AI hardware. This work signifies the power of cross-layer co-design and domain-specific customization, paving the way for future electronic-photonic accelerators with even greater performance and efficiency.
Mobile edge computing (MEC) is a promising paradigm for real-time applications with intensive computational needs (e.g., autonomous driving), as it can reduce the processing delay. In this work, we focus on the timeliness of computational-intensive updates, measured by Age-ofInformation (AoI), and study how to jointly optimize the task updating and offloading policies for AoI with fractional form. Specifically, we consider edge load dynamics and formulate a task scheduling problem to minimize the expected time-average AoI. The uncertain edge load dynamics, the nature of the fractional objective, and hybrid continuous-discrete action space (due to the joint optimization) make this problem challenging and existing approaches not directly applicable. To this end, we propose a fractional reinforcement learning(RL) framework and prove its convergence. We further design a model-free fractional deep RL (DRL) algorithm, where each device makes scheduling decisions with the hybrid action space without knowing the system dynamics and decisions of other devices. Experimental results show that our proposed algorithms reduce the average AoI by up to 57.6% compared with several non-fractional benchmarks.
Automated driving systems can be helpful in a wide range of societal challenges, e.g., mobility-on-demand and transportation logistics for last-mile delivery, by aiding the vehicle driver or taking over the responsibility for the dynamic driving task partially or completely. Ensuring the safety of automated driving systems is no trivial task, even more so for those systems of SAE Level 3 or above. To achieve this, mechanisms are needed that can continuously monitor the system's operating conditions, also denoted as the system's operational design domain. This paper presents a safety concept for automated driving systems which uses a combination of onboard runtime monitoring via connected dependability cage and off-board runtime monitoring via a remote command control center, to continuously monitor the system's ODD. On one side, the connected dependability cage fulfills a double functionality: (1) to monitor continuously the operational design domain of the automated driving system, and (2) to transfer the responsibility in a smooth and safe manner between the automated driving system and the off-board remote safety driver, who is present in the remote command control center. On the other side, the remote command control center enables the remote safety driver the monitoring and takeover of the vehicle's control. We evaluate our safety concept for automated driving systems in a lab environment and on a test field track and report on results and lessons learned.
Mobile edge computing (MEC) is essential for next-generation mobile network applications that prioritize various performance metrics, including delays and energy consumption. However, conventional single-objective scheduling solutions cannot be directly applied to practical systems in which the preferences of these applications (i.e., the weights of different objectives) are often unknown or challenging to specify in advance. In this study, we address this issue by formulating a multi-objective offloading problem for MEC with multiple edges to minimize expected long-term energy consumption and transmission delay while considering unknown preferences as parameters. To address the challenge of unknown preferences, we design a multi-objective (deep) reinforcement learning (MORL)-based resource scheduling scheme with proximal policy optimization (PPO). In addition, we introduce a well-designed state encoding method for constructing features for multiple edges in MEC systems, a sophisticated reward function for accurately computing the utilities of delay and energy consumption. Simulation results demonstrate that our proposed MORL scheme enhances the hypervolume of the Pareto front by up to 233.1% compared to benchmarks. Our full framework is available at https://github.com/gracefulning/mec_morl_multipolicy.
Exploiting the computational heterogeneity of mobile devices and edge nodes, mobile edge computation (MEC) provides an efficient approach to achieving real-time applications that are sensitive to information freshness, by offloading tasks from mobile devices to edge nodes. We use the metric Age-of-Information (AoI) to evaluate information freshness. An efficient solution to minimize the AoI for the MEC system with multiple users is non-trivial to obtain due to the random computing time. In this paper, we consider multiple users offloading tasks to heterogeneous edge servers in a MEC system. We first reformulate the problem as a Restless Multi-Arm-Bandit (RMAB) problem and establish a hierarchical Markov Decision Process (MDP) to characterize the updating of AoI for the MEC system. Based on the hierarchical MDP, we propose a nested index framework and design a nested index policy with provably asymptotic optimality. Finally, the closed form of the nested index is obtained, which enables the performance tradeoffs between computation complexity and accuracy. Our algorithm leads to an optimality gap reduction of up to 40%, compared to benchmarks. Our algorithm asymptotically approximates the lower bound as the system scalar gets large enough.
Length-controllable machine translation is a type of constrained translation. It aims to contain the original meaning as much as possible while controlling the length of the translation. We can use automatic summarization or machine translation evaluation metrics for length-controllable machine translation, but this is not necessarily suitable and accurate. This work is the first attempt to evaluate the automatic metrics for length-controllable machine translation tasks systematically. We conduct a rigorous human evaluation on two translation directions and evaluate 18 summarization or translation evaluation metrics. We find that BLEURT and COMET have the highest correlation with human evaluation and are most suitable as evaluation metrics for length-controllable machine translation.