Neurosymbolic AI combines the interpretability, parsimony, and explicit reasoning of classical symbolic approaches with the statistical learning of data-driven neural approaches. Models and policies that are simultaneously differentiable and interpretable may be key enablers of this marriage. This paper demonstrates three pathways to implementing such models and policies in a real-world reinforcement learning setting. Specifically, we study a broad class of neural networks that build interpretable semantics directly into their architecture. We reveal and highlight both the potential and the essential difficulties of combining logic, simulation, and learning. One lesson is that learning benefits from continuity and differentiability, but classical logic is discrete and non-differentiable. The relaxation to real-valued, differentiable representations presents a trade-off; the more learnable, the less interpretable. Another lesson is that using logic in the context of a numerical simulation involves a non-trivial mapping from raw (e.g., real-valued time series) simulation data to logical predicates. Some open questions this note exposes include: What are the limits of rule-based controllers, and how learnable are they? Do the differentiable interpretable approaches discussed here scale to large, complex, uncertain systems? Can we truly achieve interpretability? We highlight these and other themes across the three approaches.
In many problems, the measured variables (e.g., image pixels) are just mathematical functions of the hidden causal variables (e.g., the underlying concepts or objects). For the purpose of making predictions in changing environments or making proper changes to the system, it is helpful to recover the hidden causal variables $Z_i$ and their causal relations represented by graph $\mathcal{G}_Z$. This problem has recently been known as causal representation learning. This paper is concerned with a general, completely nonparametric setting of causal representation learning from multiple distributions (arising from heterogeneous data or nonstationary time series), without assuming hard interventions behind distribution changes. We aim to develop general solutions in this fundamental case; as a by product, this helps see the unique benefit offered by other assumptions such as parametric causal models or hard interventions. We show that under the sparsity constraint on the recovered graph over the latent variables and suitable sufficient change conditions on the causal influences, interestingly, one can recover the moralized graph of the underlying directed acyclic graph, and the recovered latent variables and their relations are related to the underlying causal model in a specific, nontrivial way. In some cases, each latent variable can even be recovered up to component-wise transformations. Experimental results verify our theoretical claims.
Air quality prediction and modelling plays a pivotal role in public health and environment management, for individuals and authorities to make informed decisions. Although traditional data-driven models have shown promise in this domain, their long-term prediction accuracy can be limited, especially in scenarios with sparse or incomplete data and they often rely on black-box deep learning structures that lack solid physical foundation leading to reduced transparency and interpretability in predictions. To address these limitations, this paper presents a novel approach named Physics guided Neural Network for Air Quality Prediction (AirPhyNet). Specifically, we leverage two well-established physics principles of air particle movement (diffusion and advection) by representing them as differential equation networks. Then, we utilize a graph structure to integrate physics knowledge into a neural network architecture and exploit latent representations to capture spatio-temporal relationships within the air quality data. Experiments on two real-world benchmark datasets demonstrate that AirPhyNet outperforms state-of-the-art models for different testing scenarios including different lead time (24h, 48h, 72h), sparse data and sudden change prediction, achieving reduction in prediction errors up to 10%. Moreover, a case study further validates that our model captures underlying physical processes of particle movement and generates accurate predictions with real physical meaning.
Algorithms designed for addressing typical supervised classification problems can only learn from a fixed set of samples and labels, making them unsuitable for the real world, where data arrives as a stream of samples often associated with multiple labels over time. This motivates the study of task-agnostic continual multi-label learning problems. While algorithms using deep learning approaches for continual multi-label learning have been proposed in the recent literature, they tend to be computationally heavy. Although spiking neural networks (SNNs) offer a computationally efficient alternative to artificial neural networks, existing literature has not used SNNs for continual multi-label learning. Also, accurately determining multiple labels with SNNs is still an open research problem. This work proposes a dual output spiking architecture (DOSA) to bridge these research gaps. A novel imbalance-aware loss function is also proposed, improving the multi-label classification performance of the model by making it more robust to data imbalance. A modified F1 score is presented to evaluate the effectiveness of the proposed loss function in handling imbalance. Experiments on several benchmark multi-label datasets show that DOSA trained with the proposed loss function shows improved robustness to data imbalance and obtains better continual multi-label learning performance than CIFDM, a previous state-of-the-art algorithm.
Accurate tooth identification and segmentation in Cone Beam Computed Tomography (CBCT) dental images can significantly enhance the efficiency and precision of manual diagnoses performed by dentists. However, existing segmentation methods are mainly developed based on large data volumes training, on which their annotations are extremely time-consuming. Meanwhile, the teeth of each class in CBCT dental images being closely positioned, coupled with subtle inter-class differences, gives rise to the challenge of indistinct boundaries when training model with limited data. To address these challenges, this study aims to propose a tasked-oriented Masked Auto-Encoder paradigm to effectively utilize large amounts of unlabeled data to achieve accurate tooth segmentation with limited labeled data. Specifically, we first construct a self-supervised pre-training framework of masked auto encoder to efficiently utilize unlabeled data to enhance the network performance. Subsequently, we introduce a sparse masked prompt mechanism based on graph attention to incorporate boundary information of the teeth, aiding the network in learning the anatomical structural features of teeth. To the best of our knowledge, we are pioneering the integration of the mask pre-training paradigm into the CBCT tooth segmentation task. Extensive experiments demonstrate both the feasibility of our proposed method and the potential of the boundary prompt mechanism.
We propose a scheme called MuNES for single mapping and trajectory planning including elevators and stairs. Optimized multifloor trajectories are important for optimal interfloor movements of robots. However, given two or more options of moving between floors, it is difficult to select the best trajectory because there are no suitable indoor multifloor maps in the existing methods. To solve this problem, MuNES creates a single multifloor map including elevators and stairs by estimating altitude changes based on pressure data. In addition, the proposed method performs floor-based loop detection for faster and more accurate loop closure. The single multifloor map is then voxelized leaving only the parts needed for trajectory planning. An optimal and realistic multifloor trajectory is generated by exploring the voxels using an A* algorithm based on the proposed cost function, which affects realistic factors. We tested this algorithm using data acquired from around a campus and note that a single accurate multifloor map could be created. Furthermore, optimal and realistic multifloor trajectory could be found by selecting the means of motion between floors between elevators and stairs according to factors such as the starting point, ending point, and elevator waiting time. The code and data used in this work are available at https://github.com/donghwijung/MuNES.
Driver intention recognition studies increasingly rely on deep neural networks. Deep neural networks have achieved top performance for many different tasks, but it is not a common practice to explicitly analyse the complexity and performance of the network's architecture. Therefore, this paper applies neural architecture search to investigate the effects of the deep neural network architecture on a real-world safety critical application with limited computational capabilities. We explore a pre-defined search space for three deep neural network layer types that are capable to handle sequential data (a long-short term memory, temporal convolution, and a time-series transformer layer), and the influence of different data fusion strategies on the driver intention recognition performance. A set of eight search strategies are evaluated for two driver intention recognition datasets. For the two datasets, we observed that there is no search strategy clearly sampling better deep neural network architectures. However, performing an architecture search does improve the model performance compared to the original manually designed networks. Furthermore, we observe no relation between increased model complexity and higher driver intention recognition performance. The result indicate that multiple architectures yield similar performance, regardless of the deep neural network layer type or fusion strategy.
Driving vehicles in complex scenarios under harsh conditions is the biggest challenge for autonomous vehicles (AVs). To address this issue, we propose hierarchical motion planning and robust control strategy using the front-active steering system in complex scenarios with various slippery road adhesion coefficients while considering vehicle uncertain parameters. Behaviors of human vehicles (HVs) are considered and modeled in the form of a car-following model via the Intelligent Driver Model (IDM). Then, in the upper layer, the motion planner first generates an optimal trajectory by using the artificial potential field (APF) algorithm to formulate any surrounding objects, e.g., road marks, boundaries, and static/dynamic obstacles. To track the generated optimal trajectory, in the lower layer, an offline-constrained output feedback robust model predictive control (RMPC) is employed for the linear parameter varying (LPV) system by applying linear matrix inequality (LMI) optimization method that ensures the robustness against the model parameter uncertainties. Furthermore, by augmenting the system model, our proposed approach, called offline RMPC, achieves outstanding efficiency compared to three existing RMPC approaches, e.g., offset-offline RMPC, online RMPC, and offline RMPC without an augmented model (offline RMPC w/o AM), in both improving computing time and reducing input vibrations.
Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a projection layer on top of the policy network requires solving an optimization program which can result in longer training time, slow convergence, and zero gradient problem. To address this, first we use a normalizing flow model to learn an invertible, differentiable mapping between the feasible action space and the support of a simple distribution on a latent variable, such as Gaussian. Second, learning the flow model requires sampling from the feasible action space, which is also challenging. We develop multiple methods, based on Hamiltonian Monte-Carlo and probabilistic sentential decision diagrams for such action sampling for convex and non-convex constraints. Third, we integrate the learned normalizing flow with the DDPG algorithm. By design, a well-trained normalizing flow will transform policy output into a valid action without requiring an optimization solver. Empirically, our approach results in significantly fewer constraint violations (upto an order-of-magnitude for several instances) and is multiple times faster on a variety of continuous control tasks.
Integrated sensing and communication (ISAC) emerges as a cornerstone technology for the upcoming 6G era, seamlessly incorporating sensing functionality into wireless networks as an inherent capability. This paper undertakes a holistic investigation of two fundamental trade-offs in monostatic OFDM ISAC systems-namely, the time-frequency domain trade-off and the spatial domain trade-off. To ensure robust sensing across diverse modulation orders in the time-frequency domain, including high-order QAM, we design a linear minimum mean-squared-error (LMMSE) estimator tailored for sensing with known, randomly generated signals of varying amplitude. Moreover, we explore spatial domain trade-offs through two ISAC transmission strategies: concurrent, employing joint beams, and time-sharing, using separate, time-non-overlapping beams for sensing and communications. Simulations demonstrate superior performance of the LMMSE estimator in detecting weak targets in the presence of strong ones under high-order QAM, consistently yielding more favorable ISAC trade-offs than existing baselines. Key insights into these trade-offs under various modulation schemes, SNR conditions, target radar cross section (RCS) levels and transmission strategies highlight the merits of the proposed LMMSE approach.