Simultaneous localization and mapping (SLAM) is critical to the implementation of autonomous driving. Most LiDAR-inertial SLAM algorithms assume a static environment, leading to unreliable localization in dynamic environments. Furthermore, accurate tracking of moving objects is of great significance for the control and planning of autonomous vehicle operation. This study proposes LIMOT, a tightly-coupled multi-object tracking and LiDAR-inertial SLAM system capable of accurately estimating the poses of both ego-vehicle and objects. First, we use 3D bounding boxes generated by an object detector to represent all movable objects and perform LiDAR odometry using inertial measurement unit (IMU) pre-integration result. Based on the historical trajectories of tracked objects in a sliding window, we perform robust object association. We propose a trajectory-based dynamic feature filtering method, which filters out features belonging to moving objects by leveraging tracking results. Factor graph-based optimization is then conducted to optimize the bias of the IMU and the poses of both the ego-vehicle and surrounding objects in a sliding window. Experiments conducted on KITTI datasets show that our method achieves better pose and tracking accuracy than our previous work DL-SLOT and other SLAM and multi-object tracking baseline methods.
Meta-reinforcement learning enables artificial agents to learn from related training tasks and adapt to new tasks efficiently with minimal interaction data. However, most existing research is still limited to narrow task distributions that are parametric and stationary, and does not consider out-of-distribution tasks during the evaluation, thus, restricting its application. In this paper, we propose MoSS, a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning to address this challenge. We extend meta-RL to broad non-parametric task distributions which have never been explored before, and also achieve state-of-the-art results in non-stationary and out-of-distribution tasks. Specifically, MoSS consists of a task inference module and a policy module. We utilize the Gaussian mixture model for task representation to imitate the parametric and non-parametric task variations. Additionally, our online adaptation strategy enables the agent to react at the first sight of a task change, thus being applicable in non-stationary tasks. MoSS also exhibits strong generalization robustness in out-of-distributions tasks which benefits from the reliable and robust task representation. The policy is built on top of an off-policy RL algorithm and the entire network is trained completely off-policy to ensure high sample efficiency. On MuJoCo and Meta-World benchmarks, MoSS outperforms prior works in terms of asymptotic performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization robustness on broad and diverse task distributions.
Deep reinforcement learning (RL) has been endowed with high expectations in tackling challenging manipulation tasks in an autonomous and self-directed fashion. Despite the significant strides made in the development of reinforcement learning, the practical deployment of this paradigm is hindered by at least two barriers, namely, the engineering of a reward function and ensuring the safety guaranty of learning-based controllers. In this paper, we address these challenging limitations by proposing a framework that merges a reinforcement learning \lstinline[columns=fixed]{planner} that is trained using sparse rewards with a model predictive controller (MPC) \lstinline[columns=fixed]{actor}, thereby offering a safe policy. On the one hand, the RL \lstinline[columns=fixed]{planner} learns from sparse rewards by selecting intermediate goals that are easy to achieve in the short term and promising to lead to target goals in the long term. On the other hand, the MPC \lstinline[columns=fixed]{actor} takes the suggested intermediate goals from the RL \lstinline[columns=fixed]{planner} as the input and predicts how the robot's action will enable it to reach that goal while avoiding any obstacles over a short period of time. We evaluated our method on four challenging manipulation tasks with dynamic obstacles and the results demonstrate that, by leveraging the complementary strengths of these two components, the agent can solve manipulation tasks in complex, dynamic environments safely with a $100\%$ success rate. Videos are available at \url{https://videoviewsite.wixsite.com/mpc-hgg}.
Halftoning aims to reproduce a continuous-tone image with pixels whose intensities are constrained to two discrete levels. This technique has been deployed on every printer, and the majority of them adopt fast methods (e.g., ordered dithering, error diffusion) that fail to render structural details, which determine halftone's quality. Other prior methods of pursuing visual pleasure by searching for the optimal halftone solution, on the contrary, suffer from their high computational cost. In this paper, we propose a fast and structure-aware halftoning method via a data-driven approach. Specifically, we formulate halftoning as a reinforcement learning problem, in which each binary pixel's value is regarded as an action chosen by a virtual agent with a shared fully convolutional neural network (CNN) policy. In the offline phase, an effective gradient estimator is utilized to train the agents in producing high-quality halftones in one action step. Then, halftones can be generated online by one fast CNN inference. Besides, we propose a novel anisotropy suppressing loss function, which brings the desirable blue-noise property. Finally, we find that optimizing SSIM could result in holes in flat areas, which can be avoided by weighting the metric with the contone's contrast map. Experiments show that our framework can effectively train a light-weight CNN, which is 15x faster than previous structure-aware methods, to generate blue-noise halftones with satisfactory visual quality. We also present a prototype of deep multitoning to demonstrate the extensibility of our method.
Sea surface temperature (SST) is uniquely important to the Earth's atmosphere since its dynamics are a major force in shaping local and global climate and profoundly affect our ecosystems. Accurate forecasting of SST brings significant economic and social implications, for example, better preparation for extreme weather such as severe droughts or tropical cyclones months ahead. However, such a task faces unique challenges due to the intrinsic complexity and uncertainty of ocean systems. Recently, deep learning techniques, such as graphical neural networks (GNN), have been applied to address this task. Even though these methods have some success, they frequently have serious drawbacks when it comes to investigating dynamic spatiotemporal dependencies between signals. To solve this problem, this paper proposes a novel static and dynamic learnable personalized graph convolution network (SD-LPGC). Specifically, two graph learning layers are first constructed to respectively model the stable long-term and short-term evolutionary patterns hidden in the multivariate SST signals. Then, a learnable personalized convolution layer is designed to fuse this information. Our experiments on real SST datasets demonstrate the state-of-the-art performances of the proposed approach on the forecasting task.
Autonomous cars are indispensable when humans go further down the hands-free route. Although existing literature highlights that the acceptance of the autonomous car will increase if it drives in a human-like manner, sparse research offers the naturalistic experience from a passenger's seat perspective to examine the human likeness of current autonomous cars. The present study tested whether the AI driver could create a human-like ride experience for passengers based on 69 participants' feedback in a real-road scenario. We designed a ride experience-based version of the non-verbal Turing test for automated driving. Participants rode in autonomous cars (driven by either human or AI drivers) as a passenger and judged whether the driver was human or AI. The AI driver failed to pass our test because passengers detected the AI driver above chance. In contrast, when the human driver drove the car, the passengers' judgement was around chance. We further investigated how human passengers ascribe humanness in our test. Based on Lewin's field theory, we advanced a computational model combining signal detection theory with pre-trained language models to predict passengers' humanness rating behaviour. We employed affective transition between pre-study baseline emotions and corresponding post-stage emotions as the signal strength of our model. Results showed that the passengers' ascription of humanness would increase with the greater affective transition. Our study suggested an important role of affective transition in passengers' ascription of humanness, which might become a future direction for autonomous driving.
Few-shot segmentation (FSS) aims to segment objects of unseen classes given only a few annotated support images. Most existing methods simply stitch query features with independent support prototypes and segment the query image by feeding the mixed features to a decoder. Although significant improvements have been achieved, existing methods are still face class biases due to class variants and background confusion. In this paper, we propose a joint framework that combines more valuable class-aware and class-agnostic alignment guidance to facilitate the segmentation. Specifically, we design a hybrid alignment module which establishes multi-scale query-support correspondences to mine the most relevant class-aware information for each query image from the corresponding support features. In addition, we explore utilizing base-classes knowledge to generate class-agnostic prior mask which makes a distinction between real background and foreground by highlighting all object regions, especially those of unseen classes. By jointly aggregating class-aware and class-agnostic alignment guidance, better segmentation performances are obtained on query images. Extensive experiments on PASCAL-$5^i$ and COCO-$20^i$ datasets demonstrate that our proposed joint framework performs better, especially on the 1-shot setting.
Meta-reinforcement learning (meta-RL) is a promising approach that enables the agent to learn new tasks quickly. However, most meta-RL algorithms show poor generalization in multiple-task scenarios due to the insufficient task information provided only by rewards. Language-conditioned meta-RL improves the generalization by matching language instructions and the agent's behaviors. Learning from symmetry is an important form of human learning, therefore, combining symmetry and language instructions into meta-RL can help improve the algorithm's generalization and learning efficiency. We thus propose a dual-MDP meta-reinforcement learning method that enables learning new tasks efficiently with symmetric data and language instructions. We evaluate our method in multiple challenging manipulation tasks, and experimental results show our method can greatly improve the generalization and efficiency of meta-reinforcement learning.