Behavior trees represent a hierarchical and modular way of combining several low-level control policies into a high-level task-switching policy. Hybrid dynamical systems can also be seen in terms of task switching between different policies, and therefore several comparisons between behavior trees and hybrid dynamical systems have been made, but only informally, and only in discrete time. A formal continuous-time formulation of behavior trees has been lacking. Additionally, convergence analyses of specific classes of behavior tree designs have been made, but not for general designs. In this letter, we provide the first continuous-time formulation of behavior trees, show that they can be seen as discontinuous dynamical systems (a subclass of hybrid dynamical systems), which enables the application of existence and uniqueness results to behavior trees, and finally, provide sufficient conditions under which such systems will converge to a desired region of the state space for general designs. With these results, a large body of results on continuous-time dynamical systems can be brought to use when designing behavior tree controllers.
From the moment Neural Networks dominated the scene for image processing, the computational complexity needed to solve the targeted tasks skyrocketed: against such an unsustainable trend, many strategies have been developed, ambitiously targeting performance's preservation. Promoting sparse topologies, for example, allows the deployment of deep neural networks models on embedded, resource-constrained devices. Recently, Capsule Networks were introduced to enhance explainability of a model, where each capsule is an explicit representation of an object or its parts. These models show promising results on toy datasets, but their low scalability prevents deployment on more complex tasks. In this work, we explore sparsity besides capsule representations to improve their computational efficiency by reducing the number of capsules. We show how pruning with Capsule Network achieves high generalization with less memory requirements, computational effort, and inference and training time.
In this study, we present a deep learning-optimization framework to tackle dynamic mixed-integer programs. Specifically, we develop a bidirectional Long Short Term Memory (LSTM) framework that can process information forward and backward in time to learn optimal solutions to sequential decision-making problems. We demonstrate our approach in predicting the optimal decisions for the single-item capacitated lot-sizing problem (CLSP), where a binary variable denotes whether to produce in a period or not. Due to the dynamic nature of the problem, the CLSP can be treated as a sequence labeling task where a recurrent neural network can capture the problem's temporal dynamics. Computational results show that our LSTM-Optimization (LSTM-Opt) framework significantly reduces the solution time of benchmark CLSP problems without much loss in feasibility and optimality. For example, the predictions at the 85\% level reduce the CPLEX solution time by a factor of 9 on average for over 240,000 test instances with an optimality gap of less than 0.05\% and 0.4\% infeasibility in the test set. Also, models trained using shorter planning horizons can successfully predict the optimal solution of the instances with longer planning horizons. For the hardest data set, the LSTM predictions at the 25\% level reduce the solution time of 70 CPU hours to less than 2 CPU minutes with an optimality gap of 0.8\% and without any infeasibility. The LSTM-Opt framework outperforms classical ML algorithms, such as the logistic regression and random forest, in terms of the solution quality, and exact approaches, such as the ($\ell$, S) and dynamic programming-based inequalities, with respect to the solution time improvement. Our machine learning approach could be beneficial in tackling sequential decision-making problems similar to CLSP, which need to be solved repetitively, frequently, and in a fast manner.
In real-world phenomena which involve mutual influence or causal effects between interconnected units, equilibrium states are typically represented with cycles in graphical models. An expressive class of graphical models, \textit{relational causal models}, can represent and reason about complex dynamic systems exhibiting such cycles or feedback loops. Existing cyclic causal discovery algorithms for learning causal models from observational data assume that the data instances are independent and identically distributed which makes them unsuitable for relational causal models. At the same time, causal discovery algorithms for relational causal models assume acyclicity. In this work, we examine the necessary and sufficient conditions under which a constraint-based relational causal discovery algorithm is sound and complete for \textit{cyclic relational causal models}. We introduce \textit{relational acyclification}, an operation specifically designed for relational models that enables reasoning about the identifiability of cyclic relational causal models. We show that under the assumptions of relational acyclification and $\sigma$-faithfulness, the relational causal discovery algorithm RCD (Maier et al. 2013) is sound and complete for cyclic models. We present experimental results to support our claim.
Recent deep learning models have achieved high performance in speech enhancement; however, it is still challenging to obtain a fast and low-complexity model without significant performance degradation. Previous knowledge distillation studies on speech enhancement could not solve this problem because their output distillation methods do not fit the speech enhancement task in some aspects. In this study, we propose multi-view attention transfer (MV-AT), a feature-based distillation, to obtain efficient speech enhancement models in the time domain. Based on the multi-view features extraction model, MV-AT transfers multi-view knowledge of the teacher network to the student network without additional parameters. The experimental results show that the proposed method consistently improved the performance of student models of various sizes on the Valentini and deep noise suppression (DNS) datasets. MANNER-S-8.1GF with our proposed method, a lightweight model for efficient deployment, achieved 15.4x and 4.71x fewer parameters and floating-point operations (FLOPs), respectively, compared to the baseline model with similar performance.
With thousands of news articles from hundreds of sources distributed and shared every day, news consumption and information acquisition have been increasingly difficult for readers. Additionally, the content of news articles is becoming catchy or even inciting to attract readership, harming the accuracy of news reporting. We present Islander, an online news analyzing system. The system allows users to browse trending topics with articles from multiple sources and perspectives. We define several metrics as proxies for news quality, and develop algorithms for automatic estimation. The quality estimation results are delivered through a web interface to newsreaders for easy access to news and information. The website is publicly available at https://islander.cc/
We propose a new semi-analytic physics informed neural network (PINN) to solve singularly perturbed boundary value problems. The PINN is a scientific machine learning framework that offers a promising perspective for finding numerical solutions to partial differential equations. The PINNs have shown impressive performance in solving various differential equations including time-dependent and multi-dimensional equations involved in a complex geometry of the domain. However, when considering stiff differential equations, neural networks in general fail to capture the sharp transition of solutions, due to the spectral bias. To resolve this issue, here we develop the semi-analytic PINN methods, enriched by using the so-called corrector functions obtained from the boundary layer analysis. Our new enriched PINNs accurately predict numerical solutions to the singular perturbation problems. Numerical experiments include various types of singularly perturbed linear and nonlinear differential equations.
Deep neural networks have a clear degradation when applying to the unseen environment due to the covariate shift. Conventional approaches like domain adaptation requires the pre-collected target data for iterative training, which is impractical in real-world applications. In this paper, we propose to adapt the deep models to the novel environment during inference. An previous solution is test time normalization, which substitutes the source statistics in BN layers with the target batch statistics. However, we show that test time normalization may potentially deteriorate the discriminative structures due to the mismatch between target batch statistics and source parameters. To this end, we present a general formulation $\alpha$-BN to calibrate the batch statistics by mixing up the source and target statistics for both alleviating the domain shift and preserving the discriminative structures. Based on $\alpha$-BN, we further present a novel loss function to form a unified test time adaptation framework Core, which performs the pairwise class correlation online optimization. Extensive experiments show that our approaches achieve the state-of-the-art performance on total twelve datasets from three topics, including model robustness to corruptions, domain generalization on image classification and semantic segmentation. Particularly, our $\alpha$-BN improves 28.4\% to 43.9\% on GTA5 $\rightarrow$ Cityscapes without any training, even outperforms the latest source-free domain adaptation method.
Suppose $L$ simultaneous independent stochastic systems generate observations, where the observations from each system depend on the underlying parameter of that system. The observations are unlabeled (anonymized), in the sense that an analyst does not know which observation came from which stochastic system. How can the analyst estimate the underlying parameters of the $L$ systems? Since the anonymized observations at each time are an unordered set of L measurements (rather than a vector), classical stochastic gradient algorithms cannot be directly used. By using symmetric polynomials, we formulate a symmetric measurement equation that maps the observation set to a unique vector. By exploiting that fact that the algebraic ring of multi-variable polynomials is a unique factorization domain over the ring of one-variable polynomials, we construct an adaptive filtering algorithm that yields a statistically consistent estimate of the underlying parameters. We analyze the asymptotic covariance of these estimates to quantify the effect of anonymization. Finally, we characterize the anonymity of the observations in terms of the error probability of the maximum aposteriori Bayesian estimator. Using Blackwell dominance of mean preserving spreads, we construct a partial ordering of the noise densities which relates the anonymity of the observations to the asymptotic covariance of the adaptive filtering algorithm.
Semantic segmentation has attracted a large amount of attention in recent years. In robotics, segmentation can be used to identify a region of interest, or \emph{target area}. For example, in the RoboCup Standard Platform League (SPL), segmentation separates the soccer field from the background and from players on the field. For satellite or vehicle applications, it is often necessary to find certain regions such as roads, bodies of water or kinds of terrain. In this paper, we propose a novel approach to real-time target area segmentation based on a newly designed spatial temporal network. The method operates under domain constraints defined by both the robot's hardware and its operating environment . The proposed network is able to run in real-time, working within the constraints of limited run time and computing power. This work is compared against other real time segmentation methods on a dataset generated by a Nao V6 humanoid robot simulating the RoboCup SPL competition. In this case, the target area is defined as the artificial grass field. The method is also tested on a maritime dataset collected by a moving vessel, where the aim is to separate the ocean region from the rest of the image. This dataset demonstrates that the proposed model can generalise to a variety of vision problems.