Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"autonomous cars": models, code, and papers

Robustness of different loss functions and their impact on networks learning capability

Nov 09, 2021
Vishal Rajput

Recent developments in AI have made it ubiquitous, every industry is trying to adopt some form of intelligent processing of their data. Despite so many advances in the field, AIs full capability is yet to be exploited by the industry. Industries that involve some risk factors still remain cautious about the usage of AI due to the lack of trust in such autonomous systems. Present-day AI might be very good in a lot of things but it is very bad in reasoning and this behavior of AI can lead to catastrophic results. Autonomous cars crashing into a person or a drone getting stuck in a tree are a few examples where AI decisions lead to catastrophic results. To develop insight and generate an explanation about the learning capability of AI, we will try to analyze the working of loss functions. For our case, we will use two sets of loss functions, generalized loss functions like Binary cross-entropy or BCE and specialized loss functions like Dice loss or focal loss. Through a series of experiments, we will establish whether combining different loss functions is better than using a single loss function and if yes, then what is the reason behind it. In order to establish the difference between generalized loss and specialized losses, we will train several models using the above-mentioned losses and then compare their robustness on adversarial examples. In particular, we will look at how fast the accuracy of different models decreases when we change the pixels corresponding to the most salient gradients.


KIT MOMA: A Mobile Machines Dataset

Jul 08, 2020
Yusheng Xiang, Hongzhe Wang, Tianqing Su, Ruoyu Li, Christine Brach, Samuel S. Mao, Marcus Geimer

Mobile machines typically working in a closed site, have a high potential to utilize autonomous driving technology. However, vigorously thriving development and innovation are happening mostly in the area of passenger cars. In contrast, although there are also many research pieces about autonomous driving or working in mobile machines, a consensus about the SOTA solution is still not achieved. We believe that the most urgent problem that should be solved is the absence of a public and challenging visual dataset, which makes the results from different researches comparable. To address the problem, we publish the KIT MOMA dataset, including eight classes of commonly used mobile machines, which can be used as a benchmark to evaluate the SOTA algorithms to detect mobile construction machines. The view of the gathered images is outside of the mobile machines since we believe fixed cameras on the ground are more suitable if all the interesting machines are working in a closed site. Most of the images in KIT MOMA are in a real scene, whereas some of the images are from the official website of top construction machine companies. Also, we have evaluated the performance of YOLO v3 on our dataset, indicating that the SOTA computer vision algorithms already show an excellent performance for detecting the mobile machines in a specific working site. Together with the dataset, we also upload the trained weights, which can be directly used by engineers from the construction machine industry. The dataset, trained weights, and updates can be found on our Github. Moreover, the demo can be found on our Youtube.

* 15 pages; 17 Figures 

RADIATE: A Radar Dataset for Automotive Perception

Oct 18, 2020
Marcel Sheeny, Emanuele De Pellegrin, Saptarshi Mukherjee, Alireza Ahrabian, Sen Wang, Andrew Wallace

Datasets for autonomous cars are essential for the development and benchmarking of perception systems. However, most existing datasets are captured with camera and LiDAR sensors in good weather conditions. In this paper, we present the RAdar Dataset In Adverse weaThEr (RADIATE), aiming to facilitate research on object detection, tracking and scene understanding using radar sensing for safe autonomous driving. RADIATE includes 3 hours of annotated radar images with more than 200K labelled road actors in total, on average about 4.6 instances per radar image. It covers 8 different categories of actors in a variety of weather conditions (e.g., sun, night, rain, fog and snow) and driving scenarios (e.g., parked, urban, motorway and suburban), representing different levels of challenge. To the best of our knowledge, this is the first public radar dataset which provides high-resolution radar images on public roads with a large amount of road actors labelled. The data collected in adverse weather, e.g., fog and snowfall, is unique. Some baseline results of radar based object detection and recognition are given to show that the use of radar data is promising for automotive applications in bad weather, where vision and LiDAR can fail. RADIATE also has stereo images, 32-channel LiDAR and GPS data, directed at other applications such as sensor fusion, localisation and mapping. The public dataset can be accessed at


Priority-based coordination of mobile robots

Oct 03, 2014
Jean Gregoire

Since the end of the 1980's, the development of self-driven autonomous vehicles is an intensive research area in most major industrial countries. Positive socio-economic potential impacts include a decrease of crashes, a reduction of travel times, energy efficiency improvements, and a reduced need of costly physical infrastructure. Some form of vehicle-to-vehicle and/or vehicle-to-infrastructure cooperation is required to ensure a safe and efficient global transportation system. This thesis deals with a particular form of cooperation by studying the problem of coordinating multiple mobile robots at an intersection area. Most of coordination systems proposed in previous work consist in planning a trajectory and to control the robots along the planned trajectory: that is the plan-as-program paradigm where planning is considered as a generative mechanism of action. The approach of the thesis is to plan priorities -- the relative order of robots to go through the intersection -- which is much weaker as many trajectories respect the same priorities. More precisely, priorities encode the homotopy classes of solutions to the coordination problem. Priority assignment is equivalent to the choice of some homotopy class to solve the coordination problem instead of a particular trajectory. Once priorities are assigned, robots are controlled through a control law preserving the assigned priorities, i.e., ensuring the described trajectory belongs to the chosen homotopy class. It results in a more robust coordination system -- able to handle a large class of unexpected events in a reactive manner -- particularly well adapted for an application to the coordination of autonomous vehicles at intersections where cars, public transport and pedestrians share the road.

* PhD Thesis, 182 pages 

M4Depth: A motion-based approach for monocular depth estimation on video sequences

May 20, 2021
Michaël Fonder, Damien Ernst, Marc Van Droogenbroeck

Getting the distance to objects is crucial for autonomous vehicles. In instances where depth sensors cannot be used, this distance has to be estimated from RGB cameras. As opposed to cars, the task of estimating depth from on-board mounted cameras is made complex on drones because of the lack of constrains on motion during flights. %In the case of drones, this task is even more complex than for car-mounted cameras since the camera motion is unconstrained. In this paper, we present a method to estimate the distance of objects seen by an on-board mounted camera by using its RGB video stream and drone motion information. Our method is built upon a pyramidal convolutional neural network architecture and uses time recurrence in pair with geometric constraints imposed by motion to produce pixel-wise depth maps. %from a RGB video stream of a camera attached to the drone In our architecture, each level of the pyramid is designed to produce its own depth estimate based on past observations and information provided by the previous level in the pyramid. We introduce a spatial reprojection layer to maintain the spatio-temporal consistency of the data between the levels. We analyse the performance of our approach on Mid-Air, a public drone dataset featuring synthetic drone trajectories recorded in a wide variety of unstructured outdoor environments. Our experiments show that our network outperforms state-of-the-art depth estimation methods and that the use of motion information is the main contributing factor for this improvement. The code of our method is publicly available on GitHub; see $\href{}{\text{}}$

* Main paper: 8 pages + references, Appendix: 2 pages 

PCT and Beyond: Towards a Computational Framework for `Intelligent' Communicative Systems

Nov 16, 2016
Prof. Roger K. Moore

Recent years have witnessed increasing interest in the potential benefits of `intelligent' autonomous machines such as robots. Honda's Asimo humanoid robot, iRobot's Roomba robot vacuum cleaner and Google's driverless cars have fired the imagination of the general public, and social media buzz with speculation about a utopian world of helpful robot assistants or the coming robot apocalypse! However, there is a long way to go before autonomous systems reach the level of capabilities required for even the simplest of tasks involving human-robot interaction - especially if it involves communicative behaviour such as speech and language. Of course the field of Artificial Intelligence (AI) has made great strides in these areas, and has moved on from abstract high-level rule-based paradigms to embodied architectures whose operations are grounded in real physical environments. What is still missing, however, is an overarching theory of intelligent communicative behaviour that informs system-level design decisions in order to provide a more coherent approach to system integration. This chapter introduces the beginnings of such a framework inspired by the principles of Perceptual Control Theory (PCT). In particular, it is observed that PCT has hitherto tended to view perceptual processes as a relatively straightforward series of transformations from sensation to perception, and has overlooked the potential of powerful generative model-based solutions that have emerged in practical fields such as visual or auditory scene analysis. Starting from first principles, a sequence of arguments is presented which not only shows how these ideas might be integrated into PCT, but which also extend PCT towards a remarkably symmetric architecture for a needs-driven communicative agent. It is concluded that, if behaviour is the control of perception, then perception is the simulation of behaviour.

* To appear in A. McElhone & W. Mansell (Eds.), Living Control Systems IV: Perceptual Control Theory and the Future of the Life and Social Sciences, Benchmark Publications Inc 

IoT System for Real-Time Near-Crash Detection for Automated Vehicle Testing

Aug 02, 2020
Ruimin Ke, Zhiyong Cui, Yanlong Chen, Meixin Zhu, Yinhai Wang

Our world is moving towards the goal of fully autonomous driving at a fast pace. While the latest automated vehicles (AVs) can handle most real-world scenarios they encounter, a major bottleneck for turning fully autonomous driving into reality is the lack of sufficient corner case data for training and testing AVs. Near-crash data, as a widely used surrogate data for traffic safety research, can also serve the purpose of AV testing if properly collected. To this end, this paper proposes an Internet-of-Things (IoT) system for real-time near-crash data collection. The system has several cool features. First, it is a low-cost and standalone system that is backward-compatible with any existing vehicles. People can fix the system to their dashboards for near-crash data collection and collision warning without the approval or help of vehicle manufacturers. Second, we propose a new near-crash detection method that models the target's size changes and relative motions with the bounding boxes generated by deep-learning-based object detection and tracking. This near-crash detection method is fast, accurate, and reliable; particularly, it is insensitive to camera parameters, thereby having an excellent transferability to different dashboard cameras. We have conducted comprehensive experiments with 100 videos locally processed at Jetson, as well as real-world tests on cars and buses. Besides collecting corner cases, it can also serve as a white-box platform for testing innovative algorithms and evaluating other AV products. The system contributes to the real-world testing of AVs and has great potential to be brought into large-scale deployment.


Automatic Labeling to Generate Training Data for Online LiDAR-based Moving Object Segmentation

Jan 12, 2022
Xieyuanli Chen, Benedikt Mersch, Lucas Nunes, Rodrigo Marcuzzi, Ignacio Vizzo, Jens Behley, Cyrill Stachniss

Understanding the scene is key for autonomously navigating vehicles and the ability to segment the surroundings online into moving and non-moving objects is a central ingredient for this task. Often, deep learning-based methods are used to perform moving object segmentation (MOS). The performance of these networks, however, strongly depends on the diversity and amount of labeled training data, information that may be costly to obtain. In this paper, we propose an automatic data labeling pipeline for 3D LiDAR data to save the extensive manual labeling effort and to improve the performance of existing learning-based MOS systems by automatically generating labeled training data. Our proposed approach achieves this by processing the data offline in batches. It first exploits an occupancy-based dynamic object removal to detect possible dynamic objects coarsely. Second, it extracts segments among the proposals and tracks them using a Kalman filter. Based on the tracked trajectories, it labels the actually moving objects such as driving cars and pedestrians as moving. In contrast, the non-moving objects, e.g., parked cars, lamps, roads, or buildings, are labeled as static. We show that this approach allows us to label LiDAR data highly effectively and compare our results to those of other label generation methods. We also train a deep neural network with our auto-generated labels and achieve similar performance compared to the one trained with manual labels on the same data, and an even better performance when using additional datasets with labels generated by our approach. Furthermore, we evaluate our method on multiple datasets using different sensors and our experiments indicate that our method can generate labels in diverse environments.

* under reviewing 

Aerial Monocular 3D Object Detection

Aug 08, 2022
Yue Hu, Shaoheng Fang, Weidi Xie, Siheng Chen

Drones equipped with cameras can significantly enhance human ability to perceive the world because of their remarkable maneuverability in 3D space. Ironically, object detection for drones has always been conducted in the 2D image space, which fundamentally limits their ability to understand 3D scenes. Furthermore, existing 3D object detection methods developed for autonomous driving cannot be directly applied to drones due to the lack of deformation modeling, which is essential for the distant aerial perspective with sensitive distortion and small objects. To fill the gap, this work proposes a dual-view detection system named DVDET to achieve aerial monocular object detection in both the 2D image space and the 3D physical space. To address the severe view deformation issue, we propose a novel trainable geo-deformable transformation module that can properly warp information from the drone's perspective to the BEV. Compared to the monocular methods for cars, our transformation includes a learnable deformable network for explicitly revising the severe deviation. To address the dataset challenge, we propose a new large-scale simulation dataset named AM3D-Sim, generated by the co-simulation of AirSIM and CARLA, and a new real-world aerial dataset named AM3D-Real, collected by DJI Matrice 300 RTK, in both datasets, high-quality annotations for 3D object detection are provided. Extensive experiments show that i) aerial monocular 3D object detection is feasible; ii) the model pre-trained on the simulation dataset benefits real-world performance, and iii) DVDET also benefits monocular 3D object detection for cars. To encourage more researchers to investigate this area, we will release the dataset and related code in

* 8 pages, 8 figures