Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks and lack systematic summaries and research directions in the future. Our work is divided into 3 independent articles and the first part is a Survey of Surveys (SoS) for total technologies of AD and IVs that involves the history, summarizes the milestones, and provides the perspectives, ethics, and future research directions. This is the second part (Part \uppercase\expandafter{\romannumeral1} for this technical survey) to review the development of control, computing system design, communication, High Definition map (HD map), testing, and human behaviors in IVs. In addition, the third part (Part \uppercase\expandafter{\romannumeral2} for this technical survey) is to review the perception and planning sections. The objective of this paper is to involve all the sections of AD, summarize the latest technical milestones, and guide abecedarians to quickly understand the development of AD and IVs. Combining the SoS and Part \uppercase\expandafter{\romannumeral2}, we anticipate that this work will bring novel and diverse insights to researchers and abecedarians, and serve as a bridge between past and future.
Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks, lack of systematic summary and research directions in the future. Here we propose a Survey of Surveys (SoS) for total technologies of AD and IVs that reviews the history, summarizes the milestones, and provides the perspectives, ethics, and future research directions. To our knowledge, this article is the first SoS with milestones in AD and IVs, which constitutes our complete research work together with two other technical surveys. We anticipate that this article will bring novel and diverse insights to researchers and abecedarians, and serve as a bridge between past and future.
Intelligent vehicles (IVs) have attracted wide attention thanks to the augmented convenience, safety advantages, and potential commercial value. Although a few of autonomous driving unicorns assert that IVs will be commercially deployable by 2025, their deployment is still restricted to small-scale validation due to various issues, among which safety, reliability, and generalization of planning methods are prominent concerns. Precise computation of control commands or trajectories by planning methods remains a prerequisite for IVs, owing to perceptual imperfections under complex environments, which pose an obstacle to the successful commercialization of IVs. This paper aims to review state-of-the-art planning methods, including pipeline planning and end-to-end planning methods. In terms of pipeline methods, a survey of selecting algorithms is provided along with a discussion of the expansion and optimization mechanisms, whereas in end-to-end methods, the training approaches and verification scenarios of driving tasks are points of concern. Experimental platforms are reviewed to facilitate readers in selecting suitable training and validation methods. Finally, the current challenges and future directions are discussed. The side-by-side comparison presented in this survey helps to gain insights into the strengths and limitations of the reviewed methods, which also assists with system-level design choices.
Pseudo supervision is regarded as the core idea in semi-supervised learning for semantic segmentation, and there is always a tradeoff between utilizing only the high-quality pseudo labels and leveraging all the pseudo labels. Addressing that, we propose a novel learning approach, called Conservative-Progressive Collaborative Learning (CPCL), among which two predictive networks are trained in parallel, and the pseudo supervision is implemented based on both the agreement and disagreement of the two predictions. One network seeks common ground via intersection supervision and is supervised by the high-quality labels to ensure a more reliable supervision, while the other network reserves differences via union supervision and is supervised by all the pseudo labels to keep exploring with curiosity. Thus, the collaboration of conservative evolution and progressive exploration can be achieved. To reduce the influences of the suspicious pseudo labels, the loss is dynamic re-weighted according to the prediction confidence. Extensive experiments demonstrate that CPCL achieves state-of-the-art performance for semi-supervised semantic segmentation.
We propose a precise and efficient normal estimation method that can deal with noise and nonuniform density for unstructured 3D point clouds. Unlike existing approaches that directly take patches and ignore the local neighborhood relationships, which make them susceptible to challenging regions such as sharp edges, we propose to learn graph convolutional feature representation for normal estimation, which emphasizes more local neighborhood geometry and effectively encodes intrinsic relationships. Additionally, we design a novel adaptive module based on the attention mechanism to integrate point features with their neighboring features, hence further enhancing the robustness of the proposed normal estimator against point density variations. To make it more distinguishable, we introduce a multi-scale architecture in the graph block to learn richer geometric features. Our method outperforms competitors with the state-of-the-art accuracy on various benchmark datasets, and is quite robust against noise, outliers, as well as the density variations.
Most conventional crowd counting methods utilize a fully-supervised learning framework to learn a mapping between scene images and crowd density maps. Under the circumstances of such fully-supervised training settings, a large quantity of expensive and time-consuming pixel-level annotations are required to generate density maps as the supervision. One way to reduce costly labeling is to exploit self-structural information and inner-relations among unlabeled images. Unlike the previous methods utilizing these relations and structural information from the original image level, we explore such self-relations from the latent feature spaces because it can extract more abundant relations and structural information. Specifically, we propose S$^2$FPR which can extract structural information and learn partial orders of coarse-to-fine pyramid features in the latent space for better crowd counting with massive unlabeled images. In addition, we collect a new unlabeled crowd counting dataset (FUDAN-UCC) with 4,000 images in total for training. One by-product is that our proposed S$^2$FPR method can leverage numerous partial orders in the latent space among unlabeled images to strengthen the model representation capability and reduce the estimation errors for the crowd counting task. Extensive experiments on four benchmark datasets, i.e. the UCF-QNRF, the ShanghaiTech PartA and PartB, and the UCF-CC-50, show the effectiveness of our method compared with previous semi-supervised methods. The source code and dataset are available at https://github.com/bridgeqiqi/S2FPR.
Deep learning has become the most powerful machine learning tool in the last decade. However, how to efficiently train deep neural networks remains to be thoroughly solved. The widely used minibatch stochastic gradient descent (SGD) still needs to be accelerated. As a promising tool to better understand the learning dynamic of minibatch SGD, the information bottleneck (IB) theory claims that the optimization process consists of an initial fitting phase and the following compression phase. Based on this principle, we further study typicality sampling, an efficient data selection method, and propose a new explanation of how it helps accelerate the training process of the deep networks. We show that the fitting phase depicted in the IB theory will be boosted with a high signal-to-noise ratio of gradient approximation if the typicality sampling is appropriately adopted. Furthermore, this finding also implies that the prior information of the training set is critical to the optimization process and the better use of the most important data can help the information flow through the bottleneck faster. Both theoretical analysis and experimental results on synthetic and real-world datasets demonstrate our conclusions.
To effectively optimize Takagi-Sugeno-Kang (TSK) fuzzy systems for regression problems, a mini-batch gradient descent with regularization, DropRule, and AdaBound (MBGD-RDA) algorithm was recently proposed. This paper further proposes FCM-RDpA, which improves MBGD-RDA by replacing the grid partition approach in rule initialization by fuzzy c-means clustering, and AdaBound by Powerball AdaBelief, which integrates recently proposed Powerball gradient and AdaBelief to further expedite and stabilize parameter optimization. Extensive experiments on 22 regression datasets with various sizes and dimensionalities validated the superiority of FCM-RDpA over MBGD-RDA, especially when the feature dimensionality is higher. We also propose an additional approach, FCM-RDpAx, that further improves FCM-RDpA by using augmented features in both the antecedents and consequents of the rules.
Medical diagnostic image analysis (e.g., CT scan or X-Ray) using machine learning is an efficient and accurate way to detect COVID-19 infections. However, sharing diagnostic images across medical institutions is usually not allowed due to the concern of patients' privacy. This causes the issue of insufficient datasets for training the image classification model. Federated learning is an emerging privacy-preserving machine learning paradigm that produces an unbiased global model based on the received updates of local models trained by clients without exchanging clients' local data. Nevertheless, the default setting of federated learning introduces huge communication cost of transferring model updates and can hardly ensure model performance when data heterogeneity of clients heavily exists. To improve communication efficiency and model performance, in this paper, we propose a novel dynamic fusion-based federated learning approach for medical diagnostic image analysis to detect COVID-19 infections. First, we design an architecture for dynamic fusion-based federated learning systems to analyse medical diagnostic images. Further, we present a dynamic fusion method to dynamically decide the participating clients according to their local model performance and schedule the model fusion-based on participating clients' training time. In addition, we summarise a category of medical diagnostic image datasets for COVID-19 detection, which can be used by the machine learning community for image analysis. The evaluation results show that the proposed approach is feasible and performs better than the default setting of federated learning in terms of model performance, communication efficiency and fault tolerance.
Ordinal regression is a type of regression techniques used for predicting an ordinal variable. Recent methods formulate an ordinal regression problem as a series of binary classification problems. Such methods cannot ensure the global ordinal relationship is preserved since the relationships among different binary classifiers are neglected. We propose a novel ordinal regression approach called Deep Ordinal Regression Forests (DORFs), which is constructed with the differentiable decision trees for obtaining precise and stable global ordinal relationships. The advantages of the proposed DORFs are twofold. First, instead of learning a series of binary classifiers independently, the proposed method learns an ordinal distribution for ordinal regression. Second, the differentiable decision trees can be trained together with the ordinal distribution in an end-to-end manner. The effectiveness of the proposed DORFs is verified on two ordinal regression tasks, i.e., facial age estimation and image aesthetic assessment, showing significant improvements and better stability over the state-of-the-art ordinal regression methods.