Training vision-based Urban Autonomous driving models is a challenging problem, which is highly researched in recent times. Training such models is a data-intensive task requiring the storage and processing of vast volumes of (possibly redundant) driving video data. In this paper, we study the problem of developing data-efficient autonomous driving systems. In this context, we study the problem of multi-criteria online video frame subset selection. We study convex optimization-based solutions and show that they are unable to provide solutions with high weightage to the loss of selected video frames. We design a novel convex optimization-based multi-criteria online subset selection algorithm that uses a thresholded concave function of selection variables. We also propose and study a submodular optimization-based algorithm. Extensive experiments using the driving simulator CARLA show that we are able to drop 80% of the frames while succeeding to complete 100% of the episodes w.r.t. the model trained on 100% data, in the most difficult task of taking turns. This results in a training time of less than 30% compared to training on the whole dataset. We also perform detailed experiments on prediction performances of various affordances used by the Conditional Affordance Learning (CAL) model and show that our subset selection improves performance on the crucial affordance "Relative Angle" during turns.
Lack of adequate training samples and noisy high-dimensional features are key challenges faced by Motor Imagery (MI) decoding algorithms for electroencephalogram (EEG) based Brain-Computer Interface (BCI). To address these challenges, inspired from neuro-physiological signatures of MI, this paper proposes a novel Filter-Bank Convolutional Network (FBCNet) for MI classification. FBCNet employs a multi-view data representation followed by spatial filtering to extract spectro-spatially discriminative features. This multistage approach enables efficient training of the network even when limited training data is available. More significantly, in FBCNet, we propose a novel Variance layer that effectively aggregates the EEG time-domain information. With this design, we compare FBCNet with state-of-the-art (SOTA) BCI algorithm on four MI datasets: The BCI competition IV dataset 2a (BCIC-IV-2a), the OpenBMI dataset, and two large datasets from chronic stroke patients. The results show that, by achieving 76.20% 4-class classification accuracy, FBCNet sets a new SOTA for BCIC-IV-2a dataset. On the other three datasets, FBCNet yields up to 8% higher binary classification accuracies. Additionally, using explainable AI techniques we present one of the first reports about the differences in discriminative EEG features between healthy subjects and stroke patients. Also, the FBCNet source code is available at https://github.com/ravikiran-mane/FBCNet.
Deformable objects present a formidable challenge for robotic manipulation due to the lack of canonical low-dimensional representations and the difficulty of capturing, predicting, and controlling such objects. We construct compact topological representations to capture the state of highly deformable objects that are topologically nontrivial. We develop an approach that tracks the evolution of this topological state through time. Under several mild assumptions, we prove that the topology of the scene and its evolution can be recovered from point clouds representing the scene. Our further contribution is a method to learn predictive models that take a sequence of past point cloud observations as input and predict a sequence of topological states, conditioned on target/future control actions. Our experiments with highly deformable objects in simulation show that the proposed multistep predictive models yield more precise results than those obtained from computational topology libraries. These models can leverage patterns inferred across various objects and offer fast multistep predictions suitable for real-time applications.
3D object detection algorithms for autonomous driving reason about 3D obstacles either from 3D birds-eye view or perspective view or both. Recent works attempt to improve the detection performance via mining and fusing from multiple egocentric views. Although the egocentric perspective view alleviates some weaknesses of the birds-eye view, the sectored grid partition becomes so coarse in the distance that the targets and surrounding context mix together, which makes the features less discriminative. In this paper, we generalize the research on 3D multi-view learning and propose a novel multi-view-based 3D detection method, named X-view, to overcome the drawbacks of the multi-view methods. Specifically, X-view breaks through the traditional limitation about the perspective view whose original point must be consistent with the 3D Cartesian coordinate. X-view is designed as a general paradigm that can be applied on almost any 3D detectors based on LiDAR with only little increment of running time, no matter it is voxel/grid-based or raw-point-based. We conduct experiments on KITTI and NuScenes datasets to demonstrate the robustness and effectiveness of our proposed X-view. The results show that X-view obtains consistent improvements when combined with four mainstream state-of-the-art 3D methods: SECOND, PointRCNN, Part-A^2, and PV-RCNN.
Cross-border security is of topmost priority for societies. Economies lose billions each year due to counterfeiters and other threats. Security checkpoints equipped with X-ray Security Systems (NIIS-Non-Intrusive Inspection Systems) like airports, ports, border control and customs authorities tackle the myriad of threats by using NIIS to inspect bags, air, land, sea and rail cargo, and vehicles. The reliance on the X-ray scanning systems necessitates their continuous 24/7 functioning being provided for. Hence the need for their working condition being closely monitored and preemptive actions being taken to reduce the overall X-ray systems downtime. In this paper, we present a predictive maintenance decision support system, abbreviated as PMT4NIIS (Predictive Maintenance Tool for Non-Intrusive Inspection Systems), which is a kind of augmented analytics platforms that provides real-time AI-generated warnings for upcoming risk of system malfunctioning leading to possible downtime. The industrial platform is the basis of a 24/7 Service Desk and Monitoring center for the working condition of various X-ray Security Systems.
A prediction interval covers a future observation from a random process in repeated sampling, and is typically constructed by identifying a pivotal quantity that is also an ancillary statistic. Outside of normality it can sometimes be challenging to identify an ancillary pivotal quantity without assuming some of the model parameters are known. A common solution is to identify an appropriate transformation of the data that yields normally distributed observations, or to treat model parameters as random variables and construct a Bayesian predictive distribution. Analogously, a tolerance interval covers a population percentile in repeated sampling and poses similar challenges outside of normality. The approach we consider leverages a link function that results in a pivotal quantity that is approximately normally distributed and produces tolerance and prediction intervals that work well for non-normal models where identifying an exact pivotal quantity may be intractable. This is the approach we explore when modeling recruitment interarrival time in clinical trials, and ultimately, time to complete recruitment.
There is an increasing interest in the NLP community in capturing variations in the usage of language, either through time (i.e., semantic drift), across regions (as dialects or variants) or in different social contexts (i.e., professional or media technolects). Several successful dynamical embeddings have been proposed that can track semantic change through time. Here we show that a model with a central word representation and a slice-dependent contribution can learn word embeddings from different corpora simultaneously. This model is based on a star-like representation of the slices. We apply it to The New York Times and The Guardian newspapers, and we show that it can capture both temporal dynamics in the yearly slices of each corpus, and language variations between US and UK English in a curated multi-source corpus. We provide an extensive evaluation of this methodology.
The widespread use of Batch Normalization has enabled training deeper neural networks with more stable and faster results. However, the Batch Normalization works best using large batch size during training and as the state-of-the-art segmentation convolutional neural network architectures are very memory demanding, large batch size is often impossible to achieve on current hardware. We evaluate the alternative normalization methods proposed to solve this issue on a problem of binary spine segmentation from 3D CT scan. Our results show the effectiveness of Instance Normalization in the limited batch size neural network training environment. Out of all the compared methods the Instance Normalization achieved the highest result with Dice coefficient = 0.96 which is comparable to our previous results achieved by deeper network with longer training time. We also show that the Instance Normalization implementation used in this experiment is computational time efficient when compared to the network without any normalization method.
The global ambitions of a carbon-neutral society necessitate a stable and robust smart grid that capitalises on frequency reserves of renewable energy. Frequency reserves are resources that adjust power production or consumption in real time to react to a power grid frequency deviation. Revenue generation motivates the availability of these resources for managing such deviations. However, limited research has been conducted on data-driven decisions and optimal bidding strategies for trading such capacities in multiple frequency reserves markets. We address this limitation by making the following research contributions. Firstly, a generalised model is designed based on an extensive study of critical characteristics of global frequency reserves markets. Secondly, three bidding strategies are proposed, based on this market model, to capitalise on price peaks in multi-stage markets. Two strategies are proposed for non-reschedulable loads, in which case the bidding strategy aims to select the market with the highest anticipated price, and the third bidding strategy focuses on rescheduling loads to hours on which highest reserve market prices are anticipated. The third research contribution is an Artificial Intelligence (AI) based bidding optimization framework that implements these three strategies, with novel uncertainty metrics that supplement data-driven price prediction. Finally, the framework is evaluated empirically using a case study of multiple frequency reserves markets in Finland. The results from this evaluation confirm the effectiveness of the proposed bidding strategies and the AI-based bidding optimization framework in terms of cumulative revenue generation, leading to an increased availability of frequency reserves.
In this paper, we propose a whole-body planning framework that unifies dynamic locomotion and manipulation tasks by formulating a single multi-contact optimal control problem. We model the hybrid nature of a generic multi-limbed mobile manipulator as a switched system, and introduce a set of constraints that can encode any pre-defined gait sequence or manipulation schedule in the formulation. Since the system is designed to actively manipulate its environment, the equations of motion are composed by augmenting the robot's centroidal dynamics with the manipulated-object dynamics. This allows us to describe any high-level task in the same cost/constraint function. The resulting planning framework could be solved on the robot's onboard computer in real-time within a model predictive control scheme. This is demonstrated in a set of real hardware experiments done in free-motion, such as base or end-effector pose tracking, and while pushing/pulling a heavy resistive door. Robustness against model mismatches and external disturbances is also verified during these test cases.