Given trajectories with gaps (i.e., missing data), we investigate algorithms to identify abnormal gaps in trajectories which occur when a given moving object did not report its location, but other moving objects in the same geographic region periodically did. The problem is important due to its societal applications, such as improving maritime safety and regulatory enforcement for global security concerns such as illegal fishing, illegal oil transfers, and trans-shipments. The problem is challenging due to the difficulty of bounding the possible locations of the moving object during a trajectory gap, and the very high computational cost of detecting gaps in such a large volume of location data. The current literature on anomalous trajectory detection assumes linear interpolation within gaps, which may not be able to detect abnormal gaps since objects within a given region may have traveled away from their shortest path. In preliminary work, we introduced an abnormal gap measure that uses a classical space-time prism model to bound an object's possible movement during the trajectory gap and provided a scalable memoized gap detection algorithm (Memo-AGD). In this paper, we propose a Space Time-Aware Gap Detection (STAGD) approach to leverage space-time indexing and merging of trajectory gaps. We also incorporate a Dynamic Region Merge-based (DRM) approach to efficiently compute gap abnormality scores. We provide theoretical proofs that both algorithms are correct and complete and also provide analysis of asymptotic time complexity. Experimental results on synthetic and real-world maritime trajectory data show that the proposed approach substantially improves computation time over the baseline technique.
Given multi-category point sets from different place-types, our goal is to develop a spatially-lucid classifier that can distinguish between two classes based on the arrangements of their points. This problem is important for many applications, such as oncology, for analyzing immune-tumor relationships and designing new immunotherapies. It is challenging due to spatial variability and interpretability needs. Previously proposed techniques require dense training data or have limited ability to handle significant spatial variability within a single place-type. Most importantly, these deep neural network (DNN) approaches are not designed to work in non-Euclidean space, particularly point sets. Existing non-Euclidean DNN methods are limited to one-size-fits-all approaches. We explore a spatial ensemble framework that explicitly uses different training strategies, including weighted-distance learning rate and spatial domain adaptation, on various place-types for spatially-lucid classification. Experimental results on real-world datasets (e.g., MxIF oncology data) show that the proposed framework provides higher prediction accuracy than baseline methods.
Given multi-model ensemble climate projections, the goal is to accurately and reliably predict future sea-level rise while lowering the uncertainty. This problem is important because sea-level rise affects millions of people in coastal communities and beyond due to climate change's impacts on polar ice sheets and the ocean. This problem is challenging due to spatial variability and unknowns such as possible tipping points (e.g., collapse of Greenland or West Antarctic ice-shelf), climate feedback loops (e.g., clouds, permafrost thawing), future policy decisions, and human actions. Most existing climate modeling approaches use the same set of weights globally, during either regression or deep learning to combine different climate projections. Such approaches are inadequate when different regions require different weighting schemes for accurate and reliable sea-level rise predictions. This paper proposes a zonal regression model which addresses spatial variability and model inter-dependency. Experimental results show more reliable predictions using the weights learned via this approach on a regional scale.
Ordinary and partial differential equations (DE) are used extensively in scientific and mathematical domains to model physical systems. Current literature has focused primarily on deep neural network (DNN) based methods for solving a specific DE or a family of DEs. Research communities with a history of using DE models may view DNN-based differential equation solvers (DNN-DEs) as a faster and transferable alternative to current numerical methods. However, there is a lack of systematic surveys detailing the use of DNN-DE methods across physical application domains and a generalized taxonomy to guide future research. This paper surveys and classifies previous works and provides an educational tutorial for senior practitioners, professionals, and graduate students in engineering and computer science. First, we propose a taxonomy to navigate domains of DE systems studied under the umbrella of DNN-DE. Second, we examine the theory and performance of the Physics Informed Neural Network (PINN) to demonstrate how the influential DNN-DE architecture mathematically solves a system of equations. Third, to reinforce the key ideas of solving and discovery of DEs using DNN, we provide a tutorial using DeepXDE, a Python package for developing PINNs, to develop DNN-DEs for solving and discovering a classic DE, the linear transport equation.
The eco-toll estimation problem quantifies the expected environmental cost (e.g., energy consumption, exhaust emissions) for a vehicle to travel along a path. This problem is important for societal applications such as eco-routing, which aims to find paths with the lowest exhaust emissions or energy need. The challenges of this problem are three-fold: (1) the dependence of a vehicle's eco-toll on its physical parameters; (2) the lack of access to data with eco-toll information; and (3) the influence of contextual information (i.e. the connections of adjacent segments in the path) on the eco-toll of road segments. Prior work on eco-toll estimation has mostly relied on pure data-driven approaches and has high estimation errors given the limited training data. To address these limitations, we propose a novel Eco-toll estimation Physics-informed Neural Network framework (Eco-PiNN) using three novel ideas, namely, (1) a physics-informed decoder that integrates the physical laws of the vehicle engine into the network, (2) an attention-based contextual information encoder, and (3) a physics-informed regularization to reduce overfitting. Experiments on real-world heavy-duty truck data show that the proposed method can greatly improve the accuracy of eco-toll estimation compared with state-of-the-art methods.
Spatiotemporal data mining aims to discover interesting, useful but non-trivial patterns in big spatial and spatiotemporal data. They are used in various application domains such as public safety, ecology, epidemiology, earth science, etc. This problem is challenging because of the high societal cost of spurious patterns and exorbitant computational cost. Recent surveys of spatiotemporal data mining need update due to rapid growth. In addition, they did not adequately survey parallel techniques for spatiotemporal data mining. This paper provides a more up-to-date survey of spatiotemporal data mining methods. Furthermore, it has a detailed survey of parallel formulations of spatiotemporal data mining.
Point set classification aims to build a representation learning model that distinguishes between spatial and categorical configurations of point set data. This problem is societally important since in many applications domains such as immunology, and microbial ecology. This problem is challenging since the interactions between different categories of points are not always equal; as a result, the representation learning model must selectively learn the most relevant multi-categorical relationships. The related works are limited (1) in learning the importance of different multi-categorical relationships, especially for high-order interactions, and (2) do not fully exploit the spatial distribution of points beyond simply measuring relative distance or applying a feed-forward neural network to coordinates. To overcome these limitations, we leverage the dynamic graph convolutional neural network (DGCNN) architecture to design a novel multi-category DGCNN (MC-DGCNN), contributing location representation and point pair attention layers for multi-categorical point set classification. MC-DGCNN has the ability to identify the categorical importance of each point pair and extends this to N-way spatial relationships, while still preserving all the properties and benefits of DGCNN (e.g., differentiability). Experimental results show that the proposed architecture is computationally efficient and significantly outperforms current deep learning architectures on real-world datasets.
Given Spatial Variability Aware Neural Networks (SVANNs), the goal is to investigate mathematical (or computational) models for comparative physical interpretation towards their transparency (e.g., simulatibility, decomposability and algorithmic transparency). This problem is important due to important use-cases such as reusability, debugging, and explainability to a jury in a court of law. Challenges include a large number of model parameters, vacuous bounds on generalization performance of neural networks, risk of overfitting, sensitivity to noise, etc., which all detract from the ability to interpret the models. Related work on either model-specific or model-agnostic post-hoc interpretation is limited due to a lack of consideration of physical constraints (e.g., mass balance) and properties (e.g., second law of geography). This work investigates physical interpretation of SVANNs using novel comparative approaches based on geographically heterogeneous features. The proposed approach on feature-based physical interpretation is evaluated using a case-study on wetland mapping. The proposed physical interpretation improves the transparency of SVANN models and the analytical results highlight the trade-off between model transparency and model performance (e.g., F1-score). We also describe an interpretation based on geographically heterogeneous processes modeled as partial differential equations (PDEs).
Given an on-board diagnostics (OBD) dataset and a physics-based emissions prediction model, this paper aims to develop an accurate and computational-efficient AI (Artificial Intelligence) method that predicts vehicle emissions. The problem is of societal importance because vehicular emissions lead to climate change and impact human health. This problem is challenging because the OBD data does not contain enough parameters needed by high-order physics models. Conversely, related work has shown that low-order physics models have poor predictive accuracy when using available OBD data. This paper uses a divergent window co-occurrence pattern detection method to develop a spatiotemporal variability-aware AI model for predicting emission values from the OBD datasets. We conducted a case study using real-world OBD data from a local public transportation agency. Results show that the proposed AI method has approximately 65% improved predictive accuracy than a non-AI low-order physics model and is approximately 35% more accurate than a baseline model.
Mapping of spatial hotspots, i.e., regions with significantly higher rates or probability density of generating certain events (e.g., disease or crime cases), is a important task in diverse societal domains, including public health, public safety, transportation, agriculture, environmental science, etc. Clustering techniques required by these domains differ from traditional clustering methods due to the high economic and social costs of spurious results (e.g., false alarms of crime clusters). As a result, statistical rigor is needed explicitly to control the rate of spurious detections. To address this challenge, techniques for statistically-robust clustering have been extensively studied by the data mining and statistics communities. In this survey we present an up-to-date and detailed review of the models and algorithms developed by this field. We first present a general taxonomy of the clustering process with statistical rigor, covering key steps of data and statistical modeling, region enumeration and maximization, significance testing, and data update. We further discuss different paradigms and methods within each of key steps. Finally, we highlight research gaps and potential future directions, which may serve as a stepping stone in generating new ideas and thoughts in this growing field and beyond.