Accurate prediction of pedestrians' future motions is critical for intelligent driving systems. Developing models for this task requires rich datasets containing diverse sets of samples. However, the existing naturalistic trajectory prediction datasets are generally imbalanced in favor of simpler samples and lack challenging scenarios. Such a long-tail effect causes prediction models to underperform on the tail portion of the data distribution containing safety-critical scenarios. Previous methods tackle the long-tail problem using methods such as contrastive learning and class-conditioned hypernetworks. These approaches, however, are not modular and cannot be applied to many machine learning architectures. In this work, we propose a modular model-agnostic framework for trajectory prediction that leverages a specialized mixture of experts. In our approach, each expert is trained with a specialized skill with respect to a particular part of the data. To produce predictions, we utilise a router network that selects the best expert by generating relative confidence scores. We conduct experimentation on common pedestrian trajectory prediction datasets and show that besides achieving state-of-the-art performance, our method significantly performs better on long-tail scenarios. We further conduct ablation studies to highlight the contribution of different proposed components.
Road user trajectory prediction in dynamic environments is a challenging but crucial task for various applications, such as autonomous driving. One of the main challenges in this domain is the multimodal nature of future trajectories stemming from the unknown yet diverse intentions of the agents. Diffusion models have shown to be very effective in capturing such stochasticity in prediction tasks. However, these models involve many computationally expensive denoising steps and sampling operations that make them a less desirable option for real-time safety-critical applications. To this end, we present a novel framework that leverages diffusion models for predicting future trajectories in a computationally efficient manner. To minimize the computational bottlenecks in iterative sampling, we employ an efficient sampling mechanism that allows us to maximize the number of sampled trajectories for improved accuracy while maintaining inference time in real time. Moreover, we propose a scoring mechanism to select the most plausible trajectories by assigning relative ranks. We show the effectiveness of our approach by conducting empirical evaluations on common pedestrian (UCY/ETH) and autonomous driving (nuScenes) benchmark datasets on which our model achieves state-of-the-art performance on several subsets and metrics.
Predicting pedestrian behavior is one of the main challenges for intelligent driving systems. In this paper, we present a new paradigm for evaluating egocentric pedestrian trajectory prediction algorithms. Based on various contextual information, we extract driving scenarios for a meaningful and systematic approach to identifying challenges for prediction models. In this regard, we also propose a new metric for more effective ranking within the scenario-based evaluation. We conduct extensive empirical studies of existing models on these scenarios to expose shortcomings and strengths of different approaches. The scenario-based analysis highlights the importance of using multimodal sources of information and challenges caused by inadequate modeling of ego-motion and scale of pedestrians. To this end, we propose a novel egocentric trajectory prediction model that benefits from multimodal sources of data fused in an effective and efficient step-wise hierarchical fashion and two auxiliary tasks designed to learn more robust representation of scene dynamics. We show that our approach achieves significant improvement by up to 40% in challenging scenarios compared to the past arts via empirical evaluation on common benchmark datasets.
Benchmarking is a common method for evaluating trajectory prediction models for autonomous driving. Existing benchmarks rely on datasets, which are biased towards more common scenarios, such as cruising, and distance-based metrics that are computed by averaging over all scenarios. Following such a regiment provides a little insight into the properties of the models both in terms of how well they can handle different scenarios and how admissible and diverse their outputs are. There exist a number of complementary metrics designed to measure the admissibility and diversity of trajectories, however, they suffer from biases, such as length of trajectories. In this paper, we propose a new benChmarking paRadIgm for evaluaTing trajEctoRy predIction Approaches (CRITERIA). Particularly, we propose 1) a method for extracting driving scenarios at varying levels of specificity according to the structure of the roads, models' performance, and data properties for fine-grained ranking of prediction models; 2) A set of new bias-free metrics for measuring diversity, by incorporating the characteristics of a given scenario, and admissibility, by considering the structure of roads and kinematic compliancy, motivated by real-world driving constraints. 3) Using the proposed benchmark, we conduct extensive experimentation on a representative set of the prediction models using the large scale Argoverse dataset. We show that the proposed benchmark can produce a more accurate ranking of the models and serve as a means of characterizing their behavior. We further present ablation studies to highlight contributions of different elements that are used to compute the proposed metrics.
Predicting temporally consistent road users' trajectories in a multi-agent setting is a challenging task due to unknown characteristics of agents and their varying intentions. Besides using semantic map information and modeling interactions, it is important to build an effective mechanism capable of reasoning about behaviors at different levels of granularity. To this end, we propose Dynamic goal quErieS with temporal Transductive alIgNmEnt (DESTINE) method. Unlike past arts, our approach 1) dynamically predicts agents' goals irrespective of particular road structures, such as lanes, allowing the method to produce a more accurate estimation of destinations; 2) achieves map compliant predictions by generating future trajectories in a coarse-to-fine fashion, where the coarser predictions at a lower frame rate serve as intermediate goals; and 3) uses an attention module designed to temporally align predicted trajectories via masked attention. Using the common Argoverse benchmark dataset, we show that our method achieves state-of-the-art performance on various metrics, and further investigate the contributions of proposed modules via comprehensive ablation studies.
This paper presents a novel backdoor attack called IMPlicit BackdOor Attack through Scenario InjecTION (IMPOSITION) that does not require direct poisoning of the training data. Instead, the attack leverages a realistic scenario from the training data as a trigger to manipulate the model's output during inference. This type of attack is particularly dangerous as it is stealthy and difficult to detect. The paper focuses on the application of this attack in the context of Autonomous Driving (AD) systems, specifically targeting the trajectory prediction module. To implement the attack, we design a trigger mechanism that mimics a set of cloned behaviors in the driving scene, resulting in a scenario that triggers the attack. The experimental results demonstrate that IMPOSITION is effective in attacking trajectory prediction models while maintaining high performance in untargeted scenarios. Our proposed method highlights the growing importance of research on the trustworthiness of Deep Neural Network (DNN) models, particularly in safety-critical applications. Backdoor attacks pose a significant threat to the safety and reliability of DNN models, and this paper presents a new perspective on backdooring DNNs. The proposed IMPOSITION paradigm and the demonstration of its severity in the context of AD systems are significant contributions of this paper. We highlight the impact of the proposed attacks via empirical studies showing how IMPOSITION can easily compromise the safety of AD systems.
Current research on pedestrian behavior understanding focuses on the dynamics of pedestrians and makes strong assumptions about their perceptual abilities. For instance, it is often presumed that pedestrians have omnidirectional view of the scene around them. In practice, human visual system has a number of limitations, such as restricted field of view (FoV) and range of sensing, which consequently affect decision-making and overall behavior of the pedestrians. By including explicit modeling of pedestrian perception, we can better understand its effect on their decision-making. To this end, we propose an agent-based pedestrian behavior model Intend-Wait-Perceive-Cross with three novel elements: field of vision, working memory, and scanning strategy, all motivated by findings from behavioral literature. Through extensive experimentation we investigate the effects of perceptual limitations on safe crossing decisions and demonstrate how they contribute to detectable changes in pedestrian behaviors.
Driving SMARTS is a regular competition designed to tackle problems caused by the distribution shift in dynamic interaction contexts that are prevalent in real-world autonomous driving (AD). The proposed competition supports methodologically diverse solutions, such as reinforcement learning (RL) and offline learning methods, trained on a combination of naturalistic AD data and open-source simulation platform SMARTS. The two-track structure allows focusing on different aspects of the distribution shift. Track 1 is open to any method and will give ML researchers with different backgrounds an opportunity to solve a real-world autonomous driving challenge. Track 2 is designed for strictly offline learning methods. Therefore, direct comparisons can be made between different methods with the aim to identify new promising research directions. The proposed setup consists of 1) realistic traffic generated using real-world data and micro simulators to ensure fidelity of the scenarios, 2) framework accommodating diverse methods for solving the problem, and 3) baseline method. As such it provides a unique opportunity for the principled investigation into various aspects of autonomous vehicle deployment.
Predicting pedestrian behavior is a crucial task for intelligent driving systems. Accurate predictions require a deep understanding of various contextual elements that potentially impact the way pedestrians behave. To address this challenge, we propose a novel framework that relies on different data modalities to predict future trajectories and crossing actions of pedestrians from an ego-centric perspective. Specifically, our model utilizes a cross-modal Transformer architecture to capture dependencies between different data types. The output of the Transformer is augmented with representations of interactions between pedestrians and other traffic agents conditioned on the pedestrian and ego-vehicle dynamics that are generated via a semantic attentive interaction module. Lastly, the context encodings are fed into a multi-stream decoder framework using a gated-shared network. We evaluate our algorithm on public pedestrian behavior benchmarks, PIE and JAAD, and show that our model improves state-of-the-art in trajectory and action prediction by up to 22% and 13% respectively on various metrics. The advantages brought by components of our model are investigated via extensive ablation studies.
In this paper, we present a microscopic agent-based pedestrian behavior model Intend-Wait-Cross. The model is comprised of rules representing behaviors of pedestrians as a series of decisions that depend on their individual characteristics (e.g. demographics, walking speed, law obedience) and environmental conditions (e.g. traffic flow, road structure). The model's main focus is on generating realistic crossing decision-model, which incorporates an improved formulation of time-to-collision (TTC) computation accounting for context, vehicle dynamics, and perceptual noise. Our model generates a diverse population of agents acting in a highly configurable environment. All model components, including individual characteristics of pedestrians, types of decisions they make, and environmental factors, are motivated by studies on pedestrian traffic behavior. Model parameters are calibrated using a combination of naturalistic driving data and estimates from the literature to maximize the realism of the simulated behaviors. A number of experiments validate various aspects of the model, such as pedestrian crossing patterns, and individual characteristics of pedestrians.