We introduce GeXSe (Generative Explanatory Sensor System), a novel framework designed to extract interpretable sensor-based and vision domain features from non-invasive smart space sensors. We combine these to provide a comprehensive explanation of sensor-activation patterns in activity recognition tasks. This system leverages advanced machine learning architectures, including transformer blocks, Fast Fourier Convolution (FFC), and diffusion models, to provide a more detailed understanding of sensor-based human activity data. A standout feature of GeXSe is our unique Multi-Layer Perceptron (MLP) with linear, ReLU, and normalization layers, specially devised for optimal performance on small datasets. It also yields meaningful activation maps to explain sensor-based activation patterns. The standard approach is based on a CNN model, which our MLP model outperforms.GeXSe offers two types of explanations: sensor-based activation maps and visual domain explanations using short videos. These methods offer a comprehensive interpretation of the output from non-interpretable sensor data, thereby augmenting the interpretability of our model. Utilizing the Frechet Inception Distance (FID) for evaluation, it outperforms established methods, improving baseline performance by about 6\%. GeXSe also achieves a high F1 score of up to 0.85, demonstrating precision, recall, and noise resistance, marking significant progress in reliable and explainable smart space sensing systems.
Timeseries partitioning is an essential step in most machine-learning driven, sensor-based IoT applications. This paper introduces a sample-efficient, robust, time-series segmentation model and algorithm. We show that by learning a representation specifically with the segmentation objective based on maximum mean discrepancy (MMD), our algorithm can robustly detect time-series events across different applications. Our loss function allows us to infer whether consecutive sequences of samples are drawn from the same distribution (null hypothesis) and determines the change-point between pairs that reject the null hypothesis (i.e., come from different distributions). We demonstrate its applicability in a real-world IoT deployment for ambient-sensing based activity recognition. Moreover, while many works on change-point detection exist in the literature, our model is significantly simpler and matches or outperforms state-of-the-art methods. We can fully train our model in 9-93 seconds on average with little variation in hyperparameters for data across different applications.
Quality sleep is very important for a healthy life. Nowadays, many people around the world are not getting enough sleep which is having negative impacts on their lifestyles. Studies are being conducted for sleep monitoring and have now become an important tool for understanding sleep behavior. The gold standard method for sleep analysis is polysomnography (PSG) conducted in a clinical environment but this method is both expensive and complex for long-term use. With the advancements in the field of sensors and the introduction of off-the-shelf technologies, unobtrusive solutions are becoming common as alternatives for in-home sleep monitoring. Various solutions have been proposed using both wearable and non-wearable methods which are cheap and easy to use for in-home sleep monitoring. In this paper, we present a comprehensive survey of the latest research works (2015 and after) conducted in various categories of sleep monitoring including sleep stage classification, sleep posture recognition, sleep disorders detection, and vital signs monitoring. We review the latest works done using the non-invasive approach and cover both wearable and non-wearable methods. We discuss the design approaches and key attributes of the work presented and provide an extensive analysis based on 10 key factors, to give a comprehensive overview of the recent developments and trends in all four categories of sleep monitoring. We also present some publicly available datasets for different categories of sleep monitoring. In the end, we discuss several open issues and provide future research directions in the area of sleep monitoring.
We introduce a new semi-supervised, time series anomaly detection algorithm that uses deep reinforcement learning (DRL) and active learning to efficiently learn and adapt to anomalies in real-world time series data. Our model - called RLAD - makes no assumption about the underlying mechanism that produces the observation sequence and continuously adapts the detection model based on experience with anomalous patterns. In addition, it requires no manual tuning of parameters and outperforms all state-of-art methods we compare with, both unsupervised and semi-supervised, across several figures of merit. More specifically, we outperform the best unsupervised approach by a factor of 1.58 on the F1 score, with only 1% of labels and up to around 4.4x on another real-world dataset with only 0.1% of labels. We compare RLAD with seven deep-learning based algorithms across two common anomaly detection datasets with up to around 3M data points and between 0.28% to 2.65% anomalies.We outperform all of them across several important performance metrics.
Supervised machine learning (ML) algorithms are aimed at maximizing classification performance under available energy and storage constraints. They try to map the training data to the corresponding labels while ensuring generalizability to unseen data. However, they do not integrate meaning-based relationships among labels in the decision process. On the other hand, natural language processing (NLP) algorithms emphasize the importance of semantic information. In this paper, we synthesize the complementary advantages of supervised ML and natural language processing algorithms into one method that we refer to as SECRET (Semantically Enhanced Classification of REal-world Tasks). SECRET performs classifications by fusing the semantic information of the labels with the available data: it combines the feature space of the supervised algorithms with the semantic space of the NLP algorithms and predicts labels based on this joint space. Experimental results indicate that, compared to traditional supervised learning, SECRET achieves up to 13.9% accuracy and 13.5% F1 score improvements. Moreover, compared to ensemble methods, SECRET achieves up to 12.6% accuracy and 13.8% F1 score improvements. This points to a new research direction for supervised classification by incorporating semantic information.
Internet of things (IoT) applications have become increasingly popular in recent years, with applications ranging from building energy monitoring to personal health tracking and activity recognition. In order to leverage these data, automatic knowledge extraction - whereby we map from observations to interpretable states and transitions - must be done at scale. As such, we have seen many recent IoT data sets include annotations with a human expert specifying states, recorded as a set of boundaries and associated labels in a data sequence. These data can be used to build automatic labeling algorithms that produce labels as an expert would. Here, we refer to human-specified boundaries as breakpoints. Traditional changepoint detection methods only look for statistically-detectable boundaries that are defined as abrupt variations in the generative parameters of a data sequence. However, we observe that breakpoints occur on more subtle boundaries that are non-trivial to detect with these statistical methods. In this work, we propose a new unsupervised approach, based on deep learning, that outperforms existing techniques and learns the more subtle, breakpoint boundaries with a high accuracy. Through extensive experiments on various real-world data sets - including human-activity sensing data, speech signals, and electroencephalogram (EEG) activity traces - we demonstrate the effectiveness of our algorithm for practical applications. Furthermore, we show that our approach achieves significantly better performance than previous methods.
Machine learning algorithms, in conjunction with user data, hold the promise of revolutionizing the way we interact with our phones, and indeed their widespread adoption in the design of apps bear testimony to this promise. However, currently, the computationally expensive segments of the learning pipeline, such as feature extraction and model training, are offloaded to the cloud, resulting in an over-reliance on the network and under-utilization of computing resources available on mobile platforms. In this paper, we show that by combining the computing power distributed over a number of phones, judicious optimization choices, and contextual information it is possible to execute the end-to-end pipeline entirely on the phones at the edge of the network, efficiently. We also show that by harnessing the power of this combination, it is possible to execute a computationally expensive pipeline at near real-time. To demonstrate our approach, we implement an end-to-end image-processing pipeline -- that includes feature extraction, vocabulary learning, vectorization, and image clustering -- on a set of mobile phones. Our results show a 75% improvement over the standard, full pipeline implementation running on the phones without modification -- reducing the time to one minute under certain conditions. We believe that this result is a promising indication that fully distributed, infrastructure-less computing is possible on networks of mobile phones; enabling a new class of mobile applications that are less reliant on the cloud.
Many sensors/meters are deployed in commercial buildings to monitor and optimize their performance. However, because sensor metadata is inconsistent across buildings, software-based solutions are tightly coupled to the sensor metadata conventions (i.e. schemas and naming) for each building. Running the same software across buildings requires significant integration effort. Metadata normalization is critical for scaling the deployment process and allows us to decouple building-specific conventions from the code written for building applications. It also allows us to deal with missing metadata. One important aspect of normalization is to differentiate sensors by the typeof phenomena being observed. In this paper, we propose a general, simple, yet effective classification scheme to differentiate sensors in buildings by type. We perform ensemble learning on data collected from over 2000 sensor streams in two buildings. Our approach is able to achieve more than 92% accuracy for classification within buildings and more than 82% accuracy for across buildings. We also introduce a method for identifying potential misclassified streams. This is important because it allows us to identify opportunities to attain more input from experts -- input that could help improve classification accuracy when ground truth is unavailable. We show that by adjusting a threshold value we are able to identify at least 30% of the misclassified instances.