Abstract:Recent multimodal large language models (MLLM) such as GPT-4o and GPT-4v have shown great potential in autonomous driving. In this paper, we propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition (TSR). We first construct a traffic sign detection network based on Vision Transformer Adapter and an extraction module to extract traffic signs from the original road images. To reduce the dependence on training data and improve the performance stability of cross-country TSR, we introduce a cross-domain few-shot in-context learning method based on the MLLM. To enhance MLLM's fine-grained recognition ability of traffic signs, the proposed method generates corresponding description texts using template traffic signs. These description texts contain key information about the shape, color, and composition of traffic signs, which can stimulate the ability of MLLM to perceive fine-grained traffic sign categories. By using the description texts, our method reduces the cross-domain differences between template and real traffic signs. Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels. We perform comprehensive evaluations on the German traffic sign recognition benchmark dataset, the Belgium traffic sign dataset, and two real-world datasets taken from Japan. The experimental results show that our method significantly enhances the TSR performance.
Abstract:Model Predictive Control (MPC)-based trajectory planning has been widely used in robotics, and incorporating Control Barrier Function (CBF) constraints into MPC can greatly improve its obstacle avoidance efficiency. Unfortunately, traditional optimizers are resource-consuming and slow to solve such non-convex constrained optimization problems (COPs) while learning-based methods struggle to satisfy the non-convex constraints. In this paper, we propose SOMTP algorithm, a self-supervised learning-based optimizer for CBF-MPC trajectory planning. Specifically, first, SOMTP employs problem transcription to satisfy most of the constraints. Then the differentiable SLPG correction is proposed to move the solution closer to the safe set and is then converted as the guide policy in the following training process. After that, inspired by the Augmented Lagrangian Method (ALM), our training algorithm integrated with guide policy constraints is proposed to enable the optimizer network to converge to a feasible solution. Finally, experiments show that the proposed algorithm has better feasibility than other learning-based methods and can provide solutions much faster than traditional optimizers with similar optimality.
Abstract:In this paper, we propose a new dataset distillation method that considers balancing global structure and local details when distilling the information from a large dataset into a generative model. Dataset distillation has been proposed to reduce the size of the required dataset when training models. The conventional dataset distillation methods face the problem of long redeployment time and poor cross-architecture performance. Moreover, previous methods focused too much on the high-level semantic attributes between the synthetic dataset and the original dataset while ignoring the local features such as texture and shape. Based on the above understanding, we propose a new method for distilling the original image dataset into a generative model. Our method involves using a conditional generative adversarial network to generate the distilled dataset. Subsequently, we ensure balancing global structure and local details in the distillation process, continuously optimizing the generator for more information-dense dataset generation.
Abstract:Herein, we propose a novel dataset distillation method for constructing small informative datasets that preserve the information of the large original datasets. The development of deep learning models is enabled by the availability of large-scale datasets. Despite unprecedented success, large-scale datasets considerably increase the storage and transmission costs, resulting in a cumbersome model training process. Moreover, using raw data for training raises privacy and copyright concerns. To address these issues, a new task named dataset distillation has been introduced, aiming to synthesize a compact dataset that retains the essential information from the large original dataset. State-of-the-art (SOTA) dataset distillation methods have been proposed by matching gradients or network parameters obtained during training on real and synthetic datasets. The contribution of different network parameters to the distillation process varies, and uniformly treating them leads to degraded distillation performance. Based on this observation, we propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance by automatically assigning importance weights to different network parameters during distillation, thereby synthesizing more robust distilled datasets. IADD demonstrates superior performance over other SOTA dataset distillation methods based on parameter matching on multiple benchmark datasets and outperforms them in terms of cross-architecture generalization. In addition, the analysis of self-adaptive weights demonstrates the effectiveness of IADD. Furthermore, the effectiveness of IADD is validated in a real-world medical application such as COVID-19 detection.
Abstract:In this letter, we aim to investigate whether laboratory rats' pain can be automatically assessed through their facial expressions. To this end, we began by presenting a publicly available dataset called RatsPain, consisting of 1,138 facial images captured from six rats that underwent an orthodontic treatment operation. Each rat' facial images in RatsPain were carefully selected from videos recorded either before or after the operation and well labeled by eight annotators according to the Rat Grimace Scale (RGS). We then proposed a novel deep learning method called PainSeeker for automatically assessing pain in rats via facial expressions. PainSeeker aims to seek pain-related facial local regions that facilitate learning both pain discriminative and head pose robust features from facial expression images. To evaluate the PainSeeker, we conducted extensive experiments on the RatsPain dataset. The results demonstrate the feasibility of assessing rats' pain from their facial expressions and also verify the effectiveness of the proposed PainSeeker in addressing this emerging but intriguing problem. The RasPain dataset can be freely obtained from https://github.com/xhzongyuan/RatsPain.
Abstract:Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training methodologies. With T3, challenge organizers train a codebase provided by the participants on sequestered training data. T3 was implemented in the STOIC2021 challenge, with the goal of predicting from a computed tomography (CT) scan whether subjects had a severe COVID-19 infection, defined as intubation or death within one month. STOIC2021 consisted of a Qualification phase, where participants developed challenge solutions using 2000 publicly available CT scans, and a Final phase, where participants submitted their training methodologies with which solutions were trained on CT scans of 9724 subjects. The organizers successfully trained six of the eight Final phase submissions. The submitted codebases for training and running inference were released publicly. The winning solution obtained an area under the receiver operating characteristic curve for discerning between severe and non-severe COVID-19 of 0.815. The Final phase solutions of all finalists improved upon their Qualification phase solutions.HSUXJM-TNZF9CHSUXJM-TNZF9C
Abstract:Owing to uncertainties in both kinematics and dynamics, the current trajectory tracking framework for mobile robots like spherical robots cannot function effectively on multiple terrains, especially uneven and unknown ones. Since this is a prerequisite for robots to execute tasks in the wild, we enhance our previous hierarchical trajectory tracking framework to handle this issue. First, a modified adaptive RBF neural network (RBFNN) is proposed to represent all uncertainties in kinodynamics. Then the Lyapunov function is utilized to design its adaptive law, and a variable step-size algorithm is employed in the weights update procedure to accelerate convergence and improve stability. Hence, a new adaptive model prediction control-based instruction planner (VAN-MPC) is proposed. Without modifying the bottom controllers, we finally develop the multi-terrain trajectory tracking framework by employing the new instruction planner VAN-MPC. The practical experiments demonstrate its effectiveness and robustness.
Abstract:Due to nonholonomic dynamics, the motion planning of nonholonomic robots is always a difficult problem. This letter presents a Discrete States-based Trajectory Planning(DSTP) algorithm for autonomous nonholonomic robots. The proposed algorithm represents the trajectory as x and y positions, orientation angle, longitude velocity and acceleration, angular velocity, and time intervals. More variables make the expression of optimization and constraints simpler, reduce the error caused by too many approximations, and also handle the gear shifting situation. L-BFGS-B is used to deal with the optimization of many variables and box constraints, thus speeding up the problem solving. Various simulation experiments compared with prior works have validated that our algorithm has an order-of-magnitude efficiency advantage and can generate a smoother trajectory with a high speed and low control effort. Besides, real-world experiments are also conducted to verify the feasibility of our algorithm in real scenes. We will release our codes as ros packages.
Abstract:Motion control is essential for all autonomous mobile robots, and even more so for spherical robots. Due to the uniqueness of the spherical robot, its motion control must not only ensure accurate tracking of the target commands, but also minimize fluctuations in the robot's attitude and motors' current while tracking. In this paper, model predictive control (MPC) is applied to the control of spherical robots and an MPC-based motion control framework is designed. There are two controllers in the framework, an optimal velocity controller ESO-MPC which combines extend states observers (ESO) and MPC, and an optimal orientation controller that uses multilayer perceptron (MLP) to generate accurate trajectories and MPC with changing weights to achieve optimal control. Finally, the performance of individual controllers and the whole control framework are verified by physical experiments. The experimental results show that the MPC-based motion control framework proposed in this work is much better than PID in terms of rapidity and accuracy, and has great advantages over sliding mode controller (SMC) for overshoot, attitude stability, current stability and energy consumption.
Abstract:Background and objective: COVID-19 and its variants have caused significant disruptions in over 200 countries and regions worldwide, affecting the health and lives of billions of people. Detecting COVID-19 from chest X-Ray (CXR) images has become one of the fastest and easiest methods for detecting COVID-19 since the common occurrence of radiological pneumonia findings in COVID-19 patients. We present a novel high-accuracy COVID-19 detection method that uses CXR images. Methods: Our method consists of two phases. One is self-supervised learning-based pertaining; the other is batch knowledge ensembling-based fine-tuning. Self-supervised learning-based pretraining can learn distinguished representations from CXR images without manually annotated labels. On the other hand, batch knowledge ensembling-based fine-tuning can utilize category knowledge of images in a batch according to their visual feature similarities to improve detection performance. Unlike our previous implementation, we introduce batch knowledge ensembling into the fine-tuning phase, reducing the memory used in self-supervised learning and improving COVID-19 detection accuracy. Results: On two public COVID-19 CXR datasets, namely, a large dataset and an unbalanced dataset, our method exhibited promising COVID-19 detection performance. Our method maintains high detection accuracy even when annotated CXR training images are reduced significantly (e.g., using only 10% of the original dataset). In addition, our method is insensitive to changes in hyperparameters. Conclusions: The proposed method outperforms other state-of-the-art COVID-19 detection methods in different settings. Our method can reduce the workloads of healthcare providers and radiologists.