Alert button
Picture for Krzysztof Czarnecki

Krzysztof Czarnecki

Alert button

Towards Object Re-Identification from Point Clouds for 3D MOT

May 17, 2023
Benjamin Thérien, Chengjie Huang, Adrian Chow, Krzysztof Czarnecki

Figure 1 for Towards Object Re-Identification from Point Clouds for 3D MOT
Figure 2 for Towards Object Re-Identification from Point Clouds for 3D MOT
Figure 3 for Towards Object Re-Identification from Point Clouds for 3D MOT
Figure 4 for Towards Object Re-Identification from Point Clouds for 3D MOT

In this work, we study the problem of object re-identification (ReID) in a 3D multi-object tracking (MOT) context, by learning to match pairs of objects from cropped (e.g., using their predicted 3D bounding boxes) point cloud observations. We are not concerned with SOTA performance for 3D MOT, however. Instead, we seek to answer the following question: In a realistic tracking by-detection context, how does object ReID from point clouds perform relative to ReID from images? To enable such a study, we propose a lightweight matching head that can be concatenated to any set or sequence processing backbone (e.g., PointNet or ViT), creating a family of comparable object ReID networks for both modalities. Run in siamese style, our proposed point-cloud ReID networks can make thousands of pairwise comparisons in real-time (10 hz). Our findings demonstrate that their performance increases with higher sensor resolution and approaches that of image ReID when observations are sufficiently dense. Additionally, we investigate our network's ability to enhance 3D multi-object tracking (MOT), showing that our point-cloud ReID networks can successfully re-identify objects which led a strong motion-based tracker into error. To our knowledge, we are the first to study real-time object re-identification from point clouds in a 3D multi-object tracking context.

Viaarxiv icon

Revealed Multi-Objective Utility Aggregation in Human Driving

Mar 13, 2023
Atrisha Sarkar, Kate Larson, Krzysztof Czarnecki

Figure 1 for Revealed Multi-Objective Utility Aggregation in Human Driving
Figure 2 for Revealed Multi-Objective Utility Aggregation in Human Driving
Figure 3 for Revealed Multi-Objective Utility Aggregation in Human Driving
Figure 4 for Revealed Multi-Objective Utility Aggregation in Human Driving

A central design problem in game theoretic analysis is the estimation of the players' utilities. In many real-world interactive situations of human decision making, including human driving, the utilities are multi-objective in nature; therefore, estimating the parameters of aggregation, i.e., mapping of multi-objective utilities to a scalar value, becomes an essential part of game construction. However, estimating this parameter from observational data introduces several challenges due to a host of unobservable factors, including the underlying modality of aggregation and the possibly boundedly rational behaviour model that generated the observation. Based on the concept of rationalisability, we develop algorithms for estimating multi-objective aggregation parameters for two common aggregation methods, weighted and satisficing aggregation, and for both strategic and non-strategic reasoning models. Based on three different datasets, we provide insights into how human drivers aggregate the utilities of safety and progress, as well as the situational dependence of the aggregation process. Additionally, we show that irrespective of the specific solution concept used for solving the games, a data-driven estimation of utility aggregation significantly improves the predictive accuracy of behaviour models with respect to observed human behaviour.

Viaarxiv icon

FJMP: Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction Graphs

Nov 27, 2022
Luke Rowe, Martin Ethier, Eli-Henry Dykhne, Krzysztof Czarnecki

Figure 1 for FJMP: Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction Graphs
Figure 2 for FJMP: Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction Graphs
Figure 3 for FJMP: Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction Graphs
Figure 4 for FJMP: Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction Graphs

Predicting the future motion of road agents is a critical task in an autonomous driving pipeline. In this work, we address the problem of generating a set of scene-level, or joint, future trajectory predictions in multi-agent driving scenarios. To this end, we propose FJMP, a Factorized Joint Motion Prediction framework for multi-agent interactive driving scenarios. FJMP models the future scene interaction dynamics as a sparse directed interaction graph, where edges denote explicit interactions between agents. We then prune the graph into a directed acyclic graph (DAG) and decompose the joint prediction task into a sequence of marginal and conditional predictions according to the partial ordering of the DAG, where joint future trajectories are decoded using a directed acyclic graph neural network (DAGNN). We conduct experiments on the INTERACTION and Argoverse 2 datasets and demonstrate that FJMP produces more accurate and scene-consistent joint trajectory predictions than non-factorized approaches, especially on the most interactive and kinematically interesting agents. FJMP ranks 1st on the multi-agent test leaderboard of the INTERACTION dataset.

Viaarxiv icon

A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition

Oct 05, 2022
Luke Rowe, Benjamin Thérien, Krzysztof Czarnecki, Hongyang Zhang

Figure 1 for A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition
Figure 2 for A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition
Figure 3 for A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition
Figure 4 for A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition

In adversarial machine learning, the popular $\ell_\infty$ threat model has been the focus of much previous work. While this mathematical definition of imperceptibility successfully captures an infinite set of additive image transformations that a model should be robust to, this is only a subset of all transformations which leave the semantic label of an image unchanged. Indeed, previous work also considered robustness to spatial attacks as well as other semantic transformations; however, designing defense methods against the composition of spatial and $\ell_{\infty}$ perturbations remains relatively underexplored. In the following, we improve the understanding of this seldom investigated compositional setting. We prove theoretically that no linear classifier can achieve more than trivial accuracy against a composite adversary in a simple statistical setting, illustrating its difficulty. We then investigate how state-of-the-art $\ell_{\infty}$ defenses can be adapted to this novel threat model and study their performance against compositional attacks. We find that our newly proposed TRADES$_{\text{All}}$ strategy performs the strongest of all. Analyzing its logit's Lipschitz constant for RT transformations of different sizes, we find that TRADES$_{\text{All}}$ remains stable over a wide range of RT transformations with and without $\ell_\infty$ perturbations.

* 16 pages, 5 figures, and 3 tables 
Viaarxiv icon

Interpretable Deep Tracking

Oct 03, 2022
Benjamin Thérien, Krzysztof Czarnecki

Figure 1 for Interpretable Deep Tracking
Figure 2 for Interpretable Deep Tracking
Figure 3 for Interpretable Deep Tracking
Figure 4 for Interpretable Deep Tracking

Imagine experiencing a crash as the passenger of an autonomous vehicle. Wouldn't you want to know why it happened? Current end-to-end optimizable deep neural networks (DNNs) in 3D detection, multi-object tracking, and motion forecasting provide little to no explanations about how they make their decisions. To help bridge this gap, we design an end-to-end optimizable multi-object tracking architecture and training protocol inspired by the recently proposed method of interchange intervention training (IIT). By enumerating different tracking decisions and associated reasoning procedures, we can train individual networks to reason about the possible decisions via IIT. Each network's decisions can be explained by the high-level structural causal model (SCM) it is trained in alignment with. Moreover, our proposed model learns to rank these outcomes, leveraging the promise of deep learning in end-to-end training, while being inherently interpretable.

Viaarxiv icon

Out-of-Distribution Detection for LiDAR-based 3D Object Detection

Sep 28, 2022
Chengjie Huang, Van Duong Nguyen, Vahdat Abdelzad, Christopher Gus Mannes, Luke Rowe, Benjamin Therien, Rick Salay, Krzysztof Czarnecki

Figure 1 for Out-of-Distribution Detection for LiDAR-based 3D Object Detection
Figure 2 for Out-of-Distribution Detection for LiDAR-based 3D Object Detection
Figure 3 for Out-of-Distribution Detection for LiDAR-based 3D Object Detection
Figure 4 for Out-of-Distribution Detection for LiDAR-based 3D Object Detection

3D object detection is an essential part of automated driving, and deep neural networks (DNNs) have achieved state-of-the-art performance for this task. However, deep models are notorious for assigning high confidence scores to out-of-distribution (OOD) inputs, that is, inputs that are not drawn from the training distribution. Detecting OOD inputs is challenging and essential for the safe deployment of models. OOD detection has been studied extensively for the classification task, but it has not received enough attention for the object detection task, specifically LiDAR-based 3D object detection. In this paper, we focus on the detection of OOD inputs for LiDAR-based 3D object detection. We formulate what OOD inputs mean for object detection and propose to adapt several OOD detection methods for object detection. We accomplish this by our proposed feature extraction method. To evaluate OOD detection methods, we develop a simple but effective technique of generating OOD objects for a given object detection model. Our evaluation based on the KITTI dataset shows that different OOD detection methods have biases toward detecting specific OOD objects. It emphasizes the importance of combined OOD detection methods and more research in this direction.

* Accepted at ITSC 2022 
Viaarxiv icon

SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving

Jun 28, 2022
Prarthana Bhattacharyya, Chengjie Huang, Krzysztof Czarnecki

Figure 1 for SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving
Figure 2 for SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving
Figure 3 for SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving
Figure 4 for SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving

Self-supervised learning (SSL) is an emerging technique that has been successfully employed to train convolutional neural networks (CNNs) and graph neural networks (GNNs) for more transferable, generalizable, and robust representation learning. However its potential in motion forecasting for autonomous driving has rarely been explored. In this study, we report the first systematic exploration and assessment of incorporating self-supervision into motion forecasting. We first propose to investigate four novel self-supervised learning tasks for motion forecasting with theoretical rationale and quantitative and qualitative comparisons on the challenging large-scale Argoverse dataset. Secondly, we point out that our auxiliary SSL-based learning setup not only outperforms forecasting methods which use transformers, complicated fusion mechanisms and sophisticated online dense goal candidate optimization algorithms in terms of performance accuracy, but also has low inference time and architectural complexity. Lastly, we conduct several experiments to understand why SSL improves motion forecasting. Code is open-sourced at \url{https://github.com/AutoVision-cloud/SSL-Lanes}.

* 16 pages, 7 figures 
Viaarxiv icon

LiDAR-MIMO: Efficient Uncertainty Estimation for LiDAR-based 3D Object Detection

Jun 01, 2022
Matthew Pitropov, Chengjie Huang, Vahdat Abdelzad, Krzysztof Czarnecki, Steven Waslander

Figure 1 for LiDAR-MIMO: Efficient Uncertainty Estimation for LiDAR-based 3D Object Detection
Figure 2 for LiDAR-MIMO: Efficient Uncertainty Estimation for LiDAR-based 3D Object Detection
Figure 3 for LiDAR-MIMO: Efficient Uncertainty Estimation for LiDAR-based 3D Object Detection
Figure 4 for LiDAR-MIMO: Efficient Uncertainty Estimation for LiDAR-based 3D Object Detection

The estimation of uncertainty in robotic vision, such as 3D object detection, is an essential component in developing safe autonomous systems aware of their own performance. However, the deployment of current uncertainty estimation methods in 3D object detection remains challenging due to timing and computational constraints. To tackle this issue, we propose LiDAR-MIMO, an adaptation of the multi-input multi-output (MIMO) uncertainty estimation method to the LiDAR-based 3D object detection task. Our method modifies the original MIMO by performing multi-input at the feature level to ensure the detection, uncertainty estimation, and runtime performance benefits are retained despite the limited capacity of the underlying detector and the large computational costs of point cloud processing. We compare LiDAR-MIMO with MC dropout and ensembles as baselines and show comparable uncertainty estimation results with only a small number of output heads. Further, LiDAR-MIMO can be configured to be twice as fast as MC dropout and ensembles, while achieving higher mAP than MC dropout and approaching that of ensembles.

* 8 pages, 4 figures and 5 tables. Accepted in IEEE IV 2022 
Viaarxiv icon

A Hierarchical Pedestrian Behavior Model to Generate Realistic Human Behavior in Traffic Simulation

Jun 01, 2022
Scott Larter, Rodrigo Queiroz, Sean Sedwards, Atrisha Sarkar, Krzysztof Czarnecki

Figure 1 for A Hierarchical Pedestrian Behavior Model to Generate Realistic Human Behavior in Traffic Simulation
Figure 2 for A Hierarchical Pedestrian Behavior Model to Generate Realistic Human Behavior in Traffic Simulation
Figure 3 for A Hierarchical Pedestrian Behavior Model to Generate Realistic Human Behavior in Traffic Simulation
Figure 4 for A Hierarchical Pedestrian Behavior Model to Generate Realistic Human Behavior in Traffic Simulation

Modelling pedestrian behavior is crucial in the development and testing of autonomous vehicles. In this work, we present a hierarchical pedestrian behavior model that generates high-level decisions through the use of behavior trees, in order to produce maneuvers executed by a low-level motion planner using an adapted Social Force model. A full implementation of our work is integrated into GeoScenario Server, a scenario definition and execution engine, extending its vehicle simulation capabilities with pedestrian simulation. The extended environment allows simulating test scenarios involving both vehicles and pedestrians to assist in the scenario-based testing process of autonomous vehicles. The presented hierarchical model is evaluated on two real-world data sets collected at separate locations with different road structures. Our model is shown to replicate the real-world pedestrians' trajectories with a high degree of fidelity and a decision-making accuracy of 98% or better, given only high-level routing information for each pedestrian.

* 9 pages, 4 figures, 3 tables. Accepted to the 2022 IEEE Intelligent Vehicles Symposium 
Viaarxiv icon

A Safety Assurable Human-Inspired Perception Architecture

May 10, 2022
Rick Salay, Krzysztof Czarnecki

Figure 1 for A Safety Assurable Human-Inspired Perception Architecture
Figure 2 for A Safety Assurable Human-Inspired Perception Architecture

Although artificial intelligence-based perception (AIP) using deep neural networks (DNN) has achieved near human level performance, its well-known limitations are obstacles to the safety assurance needed in autonomous applications. These include vulnerability to adversarial inputs, inability to handle novel inputs and non-interpretability. While research in addressing these limitations is active, in this paper, we argue that a fundamentally different approach is needed to address them. Inspired by dual process models of human cognition, where Type 1 thinking is fast and non-conscious while Type 2 thinking is slow and based on conscious reasoning, we propose a dual process architecture for safe AIP. We review research on how humans address the simplest non-trivial perception problem, image classification, and sketch a corresponding AIP architecture for this task. We argue that this architecture can provide a systematic way of addressing the limitations of AIP using DNNs and an approach to assurance of human-level performance and beyond. We conclude by discussing what components of the architecture may already be addressed by existing work and what remains future work.

Viaarxiv icon