Evaluating the reliability of intelligent physical systems against rare catastrophic events poses a huge testing burden for real-world applications. Simulation provides a useful, if not unique, platform to evaluate the extremal risks of these AI-enabled systems before their deployments. Importance Sampling (IS), while proven to be powerful for rare-event simulation, faces challenges in handling these systems due to their black-box nature that fundamentally undermines its efficiency guarantee. To overcome this challenge, we propose a framework called Deep Probabilistic Accelerated Evaluation (D-PrAE) to design IS, which leverages rare-event-set learning and a new notion of efficiency certificate. D-PrAE combines the dominating point method with deep neural network classifiers to achieve superior estimation efficiency. We present theoretical guarantees and demonstrate the empirical effectiveness of D-PrAE via examples on the safety-testing of self-driving algorithms that are beyond the reach of classical variance reduction techniques.
Despite an ever growing literature on reinforcement learning algorithms and applications, much less is known about their statistical inference. In this paper, we investigate the large sample behaviors of the Q-value estimates with closed-form characterizations of the asymptotic variances. This allows us to efficiently construct confidence regions for Q-value and optimal value functions, and to develop policies to minimize their estimation errors. This also leads to a policy exploration strategy that relies on estimating the relative discrepancies among the Q estimates. Numerical experiments show superior performances of our exploration strategy than other benchmark approaches.
In many learning problems, the training and testing data follow different distributions and a particularly common situation is the \textit{covariate shift}. To correct for sampling biases, most approaches, including the popular kernel mean matching (KMM), focus on estimating the importance weights between the two distributions. Reweighting-based methods, however, are exposed to high variance when the distributional discrepancy is large and the weights are poorly estimated. On the other hand, the alternate approach of using nonparametric regression (NR) incurs high bias when the training size is limited. In this paper, we propose and analyze a new estimator that systematically integrates the residuals of NR with KMM reweighting, based on a control-variate perspective. The proposed estimator can be shown to either strictly outperform or match the best-known existing rates for both KMM and NR, and thus is a robust combination of both estimators. The experiments shows the estimator works well in practice.
Safety evaluation of autonomous vehicles is extensively studied recently, one line of studies considers Monte Carlo based evaluation. The Monte Carlo based evaluation usually estimates the probability of safety-critical events as a safety measurement based on Monte Carlo samples. These Monte Carlo samples are generated from a stochastic model that is constructed based on real-world data. In this paper, we propose an approach to assess the potential estimation error in the evaluation procedure caused by data variability. The proposed method merges the classical bootstrap method for estimating input uncertainty with a likelihood ratio based scheme to reuse experiment results. The proposed approach is highly economical and efficient in terms of implementation costs in assessing input uncertainty for autonomous vehicle evaluation.
Effective and accurate model selection is an important problem in modern data analysis. One of the major challenges is the computational burden required to handle large data sets that cannot be stored or processed on one machine. Another challenge one may encounter is the presence of outliers and contaminations that damage the inference quality. The parallel "divide and conquer" model selection strategy divides the observations of the full data set into roughly equal subsets and perform inference and model selection independently on each subset. After local subset inference, this method aggregates the posterior model probabilities or other model/variable selection criteria to obtain a final model by using the notion of geometric median. This approach leads to improved concentration in finding the "correct" model and model parameters and also is provably robust to outliers and data contamination.
Evaluation and validation of complicated control systems are crucial to guarantee usability and safety. Usually, failure happens in some very rarely encountered situations, but once triggered, the consequence is disastrous. Accelerated Evaluation is a methodology that efficiently tests those rarely-occurring yet critical failures via smartly-sampled test cases. The distribution used in sampling is pivotal to the performance of the method, but building a suitable distribution requires case-by-case analysis. This paper proposes a versatile approach for constructing sampling distribution using kernel method. The approach uses statistical learning tools to approximate the critical event sets and constructs distributions based on the unique properties of Gaussian distributions. We applied the method to evaluate the automated vehicles. Numerical experiments show proposed approach can robustly identify the rare failures and significantly reduce the evaluation time.
The process to certify highly Automated Vehicles has not yet been defined by any country in the world. Currently, companies test Automated Vehicles on public roads, which is time-consuming and inefficient. We proposed the Accelerated Evaluation concept, which uses a modified statistics of the surrounding vehicles and the Importance Sampling theory to reduce the evaluation time by several orders of magnitude, while ensuring the evaluation results are statistically accurate. In this paper, we further improve the accelerated evaluation concept by using Piecewise Mixture Distribution models, instead of Single Parametric Distribution models. We developed and applied this idea to forward collision control system reacting to vehicles making cut-in lane changes. The behavior of the cut-in vehicles was modeled based on more than 403,581 lane changes collected by the University of Michigan Safety Pilot Model Deployment Program. Simulation results confirm that the accuracy and efficiency of the Piecewise Mixture Distribution method outperformed single parametric distribution methods in accuracy and efficiency, and accelerated the evaluation process by almost four orders of magnitude.
Automated vehicles (AVs) must be evaluated thoroughly before their release and deployment. A widely-used evaluation approach is the Naturalistic-Field Operational Test (N-FOT), which tests prototype vehicles directly on the public roads. Due to the low exposure to safety-critical scenarios, N-FOTs are time-consuming and expensive to conduct. In this paper, we propose an accelerated evaluation approach for AVs. The results can be used to generate motions of the primary other vehicles to accelerate the verification of AVs in simulations and controlled experiments. Frontal collision due to unsafe cut-ins is the target crash type of this paper. Human-controlled vehicles making unsafe lane changes are modeled as the primary disturbance to AVs based on data collected by the University of Michigan Safety Pilot Model Deployment Program. The cut-in scenarios are generated based on skewed statistics of collected human driver behaviors, which generate risky testing scenarios while preserving the statistical information so that the safety benefits of AVs in non-accelerated cases can be accurately estimated. The Cross Entropy method is used to recursively search for the optimal skewing parameters. The frequencies of occurrence of conflicts, crashes and injuries are estimated for a modeled automated vehicle, and the achieved accelerated rate is around 2,000 to 20,000. In other words, in the accelerated simulations, driving for 1,000 miles will expose the AV with challenging scenarios that will take about 2 to 20 million miles of real-world driving to encounter. This technique thus has the potential to reduce greatly the development and validation time for AVs.