Abstract:This work addresses the challenge of forecasting urban water dynamics by developing a multi-input, multi-output deep learning model that incorporates both endogenous variables (e.g., water height or discharge) and exogenous factors (e.g., precipitation history and forecast reports). Unlike conventional forecasting, the proposed model, AquaCast, captures both inter-variable and temporal dependencies across all inputs, while focusing forecast solely on endogenous variables. Exogenous inputs are fused via an embedding layer, eliminating the need to forecast them and enabling the model to attend to their short-term influences more effectively. We evaluate our approach on the LausanneCity dataset, which includes measurements from four urban drainage sensors, and demonstrate state-of-the-art performance when using only endogenous variables. Performance also improves with the inclusion of exogenous variables and forecast reports. To assess generalization and scalability, we additionally test the model on three large-scale synthesized datasets, generated from MeteoSwiss records, the Lorenz Attractors model, and the Random Fields model, each representing a different level of temporal complexity across 100 nodes. The results confirm that our model consistently outperforms existing baselines and maintains a robust and accurate forecast across both real and synthetic datasets.
Abstract:Cloud platforms are increasingly relied upon to host diverse, resource-intensive workloads due to their scalability, flexibility, and cost-efficiency. In multi-tenant cloud environments, virtual machines are consolidated on shared physical servers to improve resource utilization. While virtualization guarantees resource partitioning for CPU, memory, and storage, it cannot ensure performance isolation. Competition for shared resources such as last-level cache, memory bandwidth, and network interfaces often leads to severe performance degradation. Existing management techniques, including VM scheduling and resource provisioning, require accurate performance prediction to mitigate interference. However, this remains challenging in public clouds due to the black-box nature of VMs and the highly dynamic nature of workloads. To address these limitations, we propose CloudFormer, a dual-branch Transformer-based model designed to predict VM performance degradation in black-box environments. CloudFormer jointly models temporal dynamics and system-level interactions, leveraging 206 system metrics at one-second resolution across both static and dynamic scenarios. This design enables the model to capture transient interference effects and adapt to varying workload conditions without scenario-specific tuning. Complementing the methodology, we provide a fine-grained dataset that significantly expands the temporal resolution and metric diversity compared to existing benchmarks. Experimental results demonstrate that CloudFormer consistently outperforms state-of-the-art baselines across multiple evaluation metrics, achieving robust generalization across diverse and previously unseen workloads. Notably, CloudFormer attains a mean absolute error (MAE) of just 7.8%, representing a substantial improvement in predictive accuracy and outperforming existing methods at least by 28%.
Abstract:Traditional saliency map methods, popularized in computer vision, highlight individual points (pixels) of the input that contribute the most to the model's output. However, in time-series they offer limited insights as semantically meaningful features are often found in other domains. We introduce Cross-domain Integrated Gradients, a generalization of Integrated Gradients. Our method enables feature attributions on any domain that can be formulated as an invertible, differentiable transformation of the time domain. Crucially, our derivation extends the original Integrated Gradients into the complex domain, enabling frequency-based attributions. We provide the necessary theoretical guarantees, namely, path independence and completeness. Our approach reveals interpretable, problem-specific attributions that time-domain methods cannot capture, on three real-world tasks: wearable sensor heart rate extraction, electroencephalography-based seizure detection, and zero-shot time-series forecasting. We release an open-source Tensorflow/PyTorch library to enable plug-and-play cross-domain explainability for time-series models. These results demonstrate the ability of cross-domain integrated gradients to provide semantically meaningful insights in time-series models that are impossible with traditional time-domain saliency.
Abstract:Cardiovascular diseases, a leading cause of noncommunicable disease-related deaths, require early and accurate detection to improve patient outcomes. Taking advantage of advances in machine learning and deep learning, multiple approaches have been proposed in the literature to address the challenge of detecting ECG anomalies. Typically, these methods are based on the manual interpretation of ECG signals, which is time consuming and depends on the expertise of healthcare professionals. The objective of this work is to propose a deep learning system, FADE, designed for normal ECG forecasting and anomaly detection, which reduces the need for extensive labeled datasets and manual interpretation. FADE has been trained in a self-supervised manner with a novel morphological inspired loss function. Unlike conventional models that learn from labeled anomalous ECG waveforms, our approach predicts the future of normal ECG signals, thus avoiding the need for extensive labeled datasets. Using a novel distance function to compare forecasted ECG signals with actual sensor data, our method effectively identifies cardiac anomalies. Additionally, this approach can be adapted to new contexts through domain adaptation techniques. To evaluate our proposal, we performed a set of experiments using two publicly available datasets: MIT-BIH NSR and MIT-BIH Arrythmia. The results demonstrate that our system achieves an average accuracy of 83.84% in anomaly detection, while correctly classifying normal ECG signals with an accuracy of 85.46%. Our proposed approach exhibited superior performance in the early detection of cardiac anomalies in ECG signals, surpassing previous methods that predominantly identify a limited range of anomalies. FADE effectively detects both abnormal heartbeats and arrhythmias, offering significant advantages in healthcare through cost reduction or processing of large-scale ECG data.
Abstract:Ensuring consistent processing quality is challenging in laser processes due to varying material properties and surface conditions. Although some approaches have shown promise in solving this problem via automation, they often rely on predetermined targets or are limited to simulated environments. To address these shortcomings, we propose a novel real-time reinforcement learning approach for laser process control, implemented on a Field Programmable Gate Array to achieve real-time execution. Our experimental results from laser welding tests on stainless steel samples with a range of surface roughnesses validated the method's ability to adapt autonomously, without relying on reward engineering or prior setup information. Specifically, the algorithm learned the correct power profile for each unique surface characteristic, demonstrating significant improvements over hand-engineered optimal constant power strategies -- up to 23% better performance on rougher surfaces and 7% on mixed surfaces. This approach represents a significant advancement in automating and optimizing laser processes, with potential applications across multiple industries.
Abstract:Generating synthetic Electronic Health Records (EHRs) offers significant potential for data augmentation, privacy-preserving data sharing, and improving machine learning model training. We propose a novel tokenization strategy tailored for structured EHR data, which encompasses diverse data types such as covariates, ICD codes, and irregularly sampled time series. Using a GPT-like decoder-only transformer model, we demonstrate the generation of high-quality synthetic EHRs. Our approach is evaluated using the MIMIC-III dataset, and we benchmark the fidelity, utility, and privacy of the generated data against state-of-the-art models.
Abstract:Efficient deployment of resource-intensive transformers on edge devices necessitates cross-stack optimization. We thus study the interrelation between structured pruning and systolic acceleration, matching the size of pruned blocks with the systolic array dimensions. In this setting, computations of pruned weight blocks can be skipped, reducing run-time and energy consumption, but potentially impacting quality of service (QoS). To evaluate the trade-offs between systolic array size and sparsity opportunities, we present a novel co-design framework that integrates algorithmic optimization, system simulation, and hardware design. Targeting speech recognition using transformers as a case study, we analyze how configuration choices across the stack affect performance metrics. Results demonstrate that structured pruning on systems featuring systolic array acceleration can effectively increase performance, while maintaining high QoS levels. Up to 26% system-wide speedups due to structured pruning were measured, with only 1.4% word error rate degradation on the standard Librispeech dataset.
Abstract:Breakthroughs in ultra-low-power chip technology are transforming biomedical wearables, making it possible to monitor patients in real time with devices operating on mere {\mu}W. Although many studies have examined the power performance of commercial microcontrollers, it remains unclear which ones perform best across diverse application profiles and which hardware features are most crucial for minimizing energy consumption under varying computational loads. Identifying these features for typical wearable applications and understanding their effects on performance and energy efficiency are essential for optimizing deployment strategies and informing future hardware designs. In this work, we conduct an in-depth study of state-of-the-art (SoA) micro-controller units(MCUs) in terms of processing capability and energy efficiency using representative end-to-end SoA wearable applications. We systematically benchmark each platform across three primary application phases: idle, data acquisition, and processing, allowing a holistic assessment of the platform processing capability and overall energy efficiency across varying patient-monitoring application profiles. Our detailed analysis of performance and energy discrepancies across different platforms reveals key strengths and limitations of the current low-power hardware design and pinpoints the strengths and weaknesses of SoA MCUs. We conclude with actionable insights for wearable application designers and hardware engineers, aiming to inform future hardware design improvements and support optimal platform selection for energy-constrained biomedical applications.
Abstract:Continuous cough monitors can greatly aid doctors in home monitoring and treatment of respiratory diseases. Although many algorithms have been proposed, they still face limitations in data privacy and short-term monitoring. Edge-AI offers a promising solution by processing privacy-sensitive data near the source, but challenges arise in deploying resource-intensive algorithms on constrained devices. From a suitable selection of audio and kinematic signals, our methodology aims at the optimal selection of features via Recursive Feature Elimination with Cross-Validation (RFECV), which exploits the explainability of the selected XGB model. Additionally, it analyzes the use of Mel spectrogram features, instead of the more common MFCC. Moreover, a set of hyperparameters for a multimodal implementation of the classifier is explored. Finally, it evaluates the performance based on clinically relevant event-based metrics. We apply our methodology to develop Cough-E, an energy-efficient, multimodal and edge AI cough detection algorithm. It exploits audio and kinematic data in two distinct classifiers, jointly cooperating for a balanced energy and performance trade-off. We demonstrate that our algorithm can be executed in real-time on an ARM Cortex M33 microcontroller. Cough-E achieves a 70.56\% energy saving when compared to the audio-only approach, at the cost of a 1.26\% relative performance drop, resulting in a 0.78 F1-score. Both Cough-E and the edge-aware model optimization methodology are publicly available as open-source code. This approach demonstrates the benefits of the proposed hardware-aware methodology to enable privacy-preserving cough monitors on the edge, paving the way to efficient cough monitoring.
Abstract:Deep learning time-series processing often relies on convolutional neural networks with overlapping windows. This overlap allows the network to produce an output faster than the window length. However, it introduces additional computations. This work explores the potential to optimize computational efficiency during inference by exploiting convolution's shift-invariance properties to skip the calculation of layer activations between successive overlapping windows. Although convolutions are shift-invariant, zero-padding and pooling operations, widely used in such networks, are not efficient and complicate efficient streaming inference. We introduce StreamiNNC, a strategy to deploy Convolutional Neural Networks for online streaming inference. We explore the adverse effects of zero padding and pooling on the accuracy of streaming inference, deriving theoretical error upper bounds for pooling during streaming. We address these limitations by proposing signal padding and pooling alignment and provide guidelines for designing and deploying models for StreamiNNC. We validate our method in simulated data and on three real-world biomedical signal processing applications. StreamiNNC achieves a low deviation between streaming output and normal inference for all three networks (2.03 - 3.55% NRMSE). This work demonstrates that it is possible to linearly speed up the inference of streaming CNNs processing overlapping windows, negating the additional computation typically incurred by overlapping windows.