It is extremely important to ensure a safe takeover transition in conditionally automated driving. One of the critical factors that quantifies the safe takeover transition is takeover time. Previous studies identified the effects of many factors on takeover time, such as takeover lead time, non-driving tasks, modalities of the takeover requests (TORs), and scenario urgency. However, there is a lack of research to predict takeover time by considering these factors all at the same time. Toward this end, we used eXtreme Gradient Boosting (XGBoost) to predict the takeover time using a dataset from a meta-analysis study [1]. In addition, we used SHAP (SHapley Additive exPlanation) to analyze and explain the effects of the predictors on takeover time. We identified seven most critical predictors that resulted in the best prediction performance. Their main effects and interaction effects on takeover time were examined. The results showed that the proposed approach provided both good performance and explainability. Our findings have implications on the design of in-vehicle monitoring and alert systems to facilitate the interaction between the drivers and the automated vehicle.
Spiking neural network (SNN) is a brain-inspired model which has more spatio-temporal information processing capacity and computational energy efficiency. However, with the increasing depth of SNNs, the memory problem caused by the weights of SNNs has gradually attracted attention. Inspired by Artificial Neural Networks (ANNs) quantization technology, binarized SNN (BSNN) is introduced to solve the memory problem. Due to the lack of suitable learning algorithms, BSNN is usually obtained by ANN-to-SNN conversion, whose accuracy will be limited by the trained ANNs. In this paper, we propose an ultra-low latency adaptive local binary spiking neural network (ALBSNN) with accuracy loss estimators, which dynamically selects the network layers to be binarized to ensure the accuracy of the network by evaluating the error caused by the binarized weights during the network learning process. Experimental results show that this method can reduce storage space by more than 20 % without losing network accuracy. At the same time, in order to accelerate the training speed of the network, the global average pooling(GAP) layer is introduced to replace the fully connected layers by the combination of convolution and pooling, so that SNNs can use a small number of time steps to obtain better recognition accuracy. In the extreme case of using only one time step, we still can achieve 92.92 %, 91.63 % ,and 63.54 % testing accuracy on three different datasets, FashionMNIST, CIFAR-10, and CIFAR-100, respectively.
A key goal of unsupervised learning is to go beyond density estimation and sample generation to reveal the structure inherent within observed data. Such structure can be expressed in the pattern of interactions between explanatory latent variables captured through a probabilistic graphical model. Although the learning of structured graphical models has a long history, much recent work in unsupervised modelling has instead emphasised flexible deep-network-based generation, either transforming independent latent generators to model complex data or assuming that distinct observed variables are derived from different latent nodes. Here, we extend the output of amortised variational inference to incorporate structured factors over multiple variables, able to capture the observation-induced posterior dependence between latents that results from "explaining away" and thus allow complex observations to depend on multiple nodes of a structured graph. We show that appropriately parameterised factors can be combined efficiently with variational message passing in elaborate graphical structures. We instantiate the framework based on Gaussian Process Factor Analysis models, and empirically evaluate its improvement over existing methods on synthetic data with known generative processes. We then fit the structured model to high-dimensional neural spiking time-series from the hippocampus of freely moving rodents, demonstrating that the model identifies latent signals that correlate with behavioural covariates.
The increasing adoption of Digital Assets (DAs), such as Bitcoin (BTC), rises the need for accurate option pricing models. Yet, existing methodologies fail to cope with the volatile nature of the emerging DAs. Many models have been proposed to address the unorthodox market dynamics and frequent disruptions in the microstructure caused by the non-stationarity, and peculiar statistics, in DA markets. However, they are either prone to the curse of dimensionality, as additional complexity is required to employ traditional theories, or they overfit historical patterns that may never repeat. Instead, we leverage recent advances in market regime (MR) clustering with the Implied Stochastic Volatility Model (ISVM). Time-regime clustering is a temporal clustering method, that clusters the historic evolution of a market into different volatility periods accounting for non-stationarity. ISVM can incorporate investor expectations in each of the sentiment-driven periods by using implied volatility (IV) data. In this paper, we applied this integrated time-regime clustering and ISVM method (termed MR-ISVM) to high-frequency data on BTC options at the popular trading platform Deribit. We demonstrate that MR-ISVM contributes to overcome the burden of complex adaption to jumps in higher order characteristics of option pricing models. This allows us to price the market based on the expectations of its participants in an adaptive fashion.
With the spread of Data Center Interconnect (DCI) and local 5G, there is a growing need for dynamically established connections between customer locations through high-capacity optical links. However, link parameters such as signal power profile and amplifier gains are often unknown and have to be measured by experts, preventing dynamic path provisioning due to the time-consuming manual measurements. Although several techniques for estimating the unknown parameters of such alien access links have been proposed, no work has presented architecture and protocol that drive the estimation techniques to establish an optical path between the customer locations. Our study aims to automatically connect customer-owned transceivers via alien access links with optimal quality of transmission (QoT). We first propose an architecture and protocol for cooperative optical path design between a customer and carrier, utilizing a state-of-the-art technique for estimating link parameters. We then implement the proposed protocol in a software-defined network (SDN) controller and white-box transponders using an open API. The experiments demonstrate that the optical path is dynamically established via alien access links in 137 seconds from the transceiver's cold start. Lastly, we discuss the QoT accuracy obtained with this method and the remaining issues.
Temporal action localization (TAL) is a task of identifying a set of actions in a video, which involves localizing the start and end frames and classifying each action instance. Existing methods have addressed this task by using predefined anchor windows or heuristic bottom-up boundary-matching strategies, which are major bottlenecks in inference time. Additionally, the main challenge is the inability to capture long-range actions due to a lack of global contextual information. In this paper, we present a novel anchor-free framework, referred to as HTNet, which predicts a set of <start time, end time, class> triplets from a video based on a Transformer architecture. After the prediction of coarse boundaries, we refine it through a background feature sampling (BFS) module and hierarchical Transformers, which enables our model to aggregate global contextual information and effectively exploit the inherent semantic relationships in a video. We demonstrate how our method localizes accurate action instances and achieves state-of-the-art performance on two TAL benchmark datasets: THUMOS14 and ActivityNet 1.3.
An accurate model of natural speech directivity is an important step toward achieving realistic vocal presence within a virtual communication setting. In this article, we propose a method to estimate and reconstruct the spatial energy distribution pattern of natural, unconstrained speech. We detail our method in two stages. Using recordings of speech captured by a real, static microphone array, we create a virtual array that tracks with the movement of the speaker over time. We use this egocentric virtual array to measure and encode the high-resolution directivity pattern of the speech signal as it dynamically evolves with natural speech and movement. Utilizing this encoded directivity representation, we train a machine learning model that leverages to estimate the full, dynamic directivity pattern when given a limited set of speech signals, as would be the case when speech is recorded using the microphones on a head-mounted display (HMD). We examine a variety of model architectures and training paradigms, and discuss the utility and practicality of each implementation. Our results demonstrate that neural networks can be used to regress from limited speech information to an accurate, dynamic estimation of the full directivity pattern.
Topological methods can provide a way of proposing new metrics and methods of scrutinising data, that otherwise may be overlooked. In this work, a method of quantifying the shape of data, via a topic called topological data analysis will be introduced. The main tool within topological data analysis (TDA) is persistent homology. Persistent homology is a method of quantifying the shape of data over a range of length scales. The required background and a method of computing persistent homology is briefly discussed in this work. Ideas from topological data analysis are then used for nonlinear dynamics to analyse some common attractors, by calculating their embedding dimension, and then to assess their general topologies. A method will also be proposed, that uses topological data analysis to determine the optimal delay for a time-delay embedding. TDA will also be applied to a Z24 Bridge case study in structural health monitoring, where it will be used to scrutinise different data partitions, classified by the conditions at which the data were collected. A metric, from topological data analysis, is used to compare data between the partitions. The results presented demonstrate that the presence of damage alters the manifold shape more significantly than the effects present from temperature.
Traffic forecasting is an integral part of intelligent transportation systems (ITS). Achieving a high prediction accuracy is a challenging task due to a high level of dynamics and complex spatial-temporal dependency of road networks. For this task, we propose Graph Attention-Convolution-Attention Networks (GACAN). The model uses a novel Att-Conv-Att (ACA) block which contains two graph attention layers and one spectral-based GCN layer sandwiched in between. The graph attention layers are meant to capture temporal features while the spectral-based GCN layer is meant to capture spatial features. The main novelty of the model is the integration of time series of four different time granularities: the original time series, together with hourly, daily, and weekly time series. Unlike previous work that used multi-granularity time series by handling every time series separately, GACAN combines the outcome of processing all time series after each graph attention layer. Thus, the effects of different time granularities are integrated throughout the model. We perform a series of experiments on three real-world datasets. The experimental results verify the advantage of using multi-granularity time series and that the proposed GACAN model outperforms the state-of-the-art baselines.
Bundle Adjustment (BA) refers to the problem of simultaneous determination of sensor poses and scene geometry, which is a fundamental problem in robot vision. This paper presents an efficient and consistent bundle adjustment method for lidar sensors. The method employs edge and plane features to represent the scene geometry, and directly minimizes the natural Euclidean distance from each raw point to the respective geometry feature. A nice property of this formulation is that the geometry features can be analytically solved, drastically reducing the dimension of the numerical optimization. To represent and solve the resultant optimization problem more efficiently, this paper then proposes a novel concept {\it point clusters}, which encodes all raw points associated to the same feature by a compact set of parameters, the {\it point cluster coordinates}. We derive the closed-form derivatives, up to the second order, of the BA optimization based on the point cluster coordinates and show their theoretical properties such as the null spaces and sparsity. Based on these theoretical results, this paper develops an efficient second-order BA solver. Besides estimating the lidar poses, the solver also exploits the second order information to estimate the pose uncertainty caused by measurement noises, leading to consistent estimates of lidar poses. Moreover, thanks to the use of point cluster, the developed solver fundamentally avoids the enumeration of each raw point (which is very time-consuming due to the large number) in all steps of the optimization: cost evaluation, derivatives evaluation and uncertainty evaluation. The implementation of our method is open sourced to benefit the robotics community and beyond.