Sentiment analysis is a crucial task in natural language processing that involves identifying and extracting subjective sentiment from text. Self-training has recently emerged as an economical and efficient technique for developing sentiment analysis models by leveraging a small amount of labeled data and a larger amount of unlabeled data. However, the performance of a self-training procedure heavily relies on the choice of the instance selection strategy, which has not been studied thoroughly. This paper presents an empirical study on various instance selection strategies for self-training on two public sentiment datasets, and investigates the influence of the strategy and hyper-parameters on the performance of self-training in various few-shot settings.
Forecasting the scalable future states of surrounding traffic participants in complex traffic scenarios is a critical capability for autonomous vehicles, as it enables safe and feasible decision-making. Recent successes in learning-based prediction and planning have introduced two primary challenges: generating accurate joint predictions for the environment and integrating prediction guidance for planning purposes. To address these challenges, we propose a two-stage integrated neural planning framework, termed OPGP, that incorporates joint prediction guidance from occupancy forecasting. The preliminary planning phase simultaneously outputs the predicted occupancy for various types of traffic actors based on imitation learning objectives, taking into account shared interactions, scene context, and actor dynamics within a unified Transformer structure. Subsequently, the transformed occupancy prediction guides optimization to further inform safe and smooth planning under Frenet coordinates. We train our planner using a large-scale, real-world driving dataset and validate it in open-loop configurations. Our proposed planner outperforms strong learning-based methods, exhibiting improved performance due to occupancy prediction guidance.
Autonomous vehicles operating in complex real-world environments require accurate predictions of interactive behaviors between traffic participants. While existing works focus on modeling agent interactions based on their past trajectories, their future interactions are often ignored. This paper addresses the interaction prediction problem by formulating it with hierarchical game theory and proposing the GameFormer framework to implement it. Specifically, we present a novel Transformer decoder structure that uses the prediction results from the previous level together with the common environment background to iteratively refine the interaction process. Moreover, we propose a learning process that regulates an agent's behavior at the current level to respond to other agents' behaviors from the last level. Through experiments on a large-scale real-world driving dataset, we demonstrate that our model can achieve state-of-the-art prediction accuracy on the interaction prediction task. We also validate the model's capability to jointly reason about the ego agent's motion plans and other agents' behaviors in both open-loop and closed-loop planning tests, outperforming a variety of baseline methods.
Pairwise learning strategies are prevalent for optimizing recommendation models on implicit feedback data, which usually learns user preference by discriminating between positive (i.e., clicked by a user) and negative items (i.e., obtained by negative sampling). However, the size of different item groups (specified by item attribute) is usually unevenly distributed. We empirically find that the commonly used uniform negative sampling strategy for pairwise algorithms (e.g., BPR) can inherit such data bias and oversample the majority item group as negative instances, severely countering group fairness on the item side. In this paper, we propose a Fairly adaptive Negative sampling approach (FairNeg), which improves item group fairness via adaptively adjusting the group-level negative sampling distribution in the training process. In particular, it first perceives the model's unfairness status at each step and then adjusts the group-wise sampling distribution with an adaptive momentum update strategy for better facilitating fairness optimization. Moreover, a negative sampling distribution Mixup mechanism is proposed, which gracefully incorporates existing importance-aware sampling techniques intended for mining informative negative samples, thus allowing for achieving multiple optimization purposes. Extensive experiments on four public datasets show our proposed method's superiority in group fairness enhancement and fairness-utility tradeoff.
Predicting the behaviors of other road users is crucial to safe and intelligent decision-making for autonomous vehicles (AVs). However, most motion prediction models ignore the influence of the AV's actions and the planning module has to treat other agents as unalterable moving obstacles. To address this problem, this paper proposes an interaction-aware motion prediction model that is able to predict other agents' future trajectories according to the ego agent's future plan, i.e., their reactions to the ego's actions. Specifically, we employ Transformers to effectively encode the driving scene and incorporate the AV's plan in decoding the predicted trajectories. To train the model to accurately predict the reactions of other agents, we develop an online learning framework, where the ego agent explores the environment and collects other agents' reactions to itself. We validate the decision-making and learning framework in three highly interactive simulated driving scenarios. The results reveal that our decision-making method significantly outperforms the reinforcement learning methods in terms of data efficiency and performance. We also find that using the interaction-aware model can bring better performance than the non-interaction-aware model and the exploration process helps improve the success rate in testing.
Making safe and human-like decisions is an essential capability of autonomous driving systems and learning-based behavior planning is a promising pathway toward this objective. Distinguished from existing learning-based methods that directly output decisions, this work introduces a predictive behavior planning framework that learns to predict and evaluate from human driving data. Concretely, a behavior generation module first produces a diverse set of candidate behaviors in the form of trajectory proposals. Then the proposed conditional motion prediction network is employed to forecast other agents' future trajectories conditioned on each trajectory proposal. Given the candidate plans and associated prediction results, we learn a scoring module to evaluate the plans using maximum entropy inverse reinforcement learning (IRL). We conduct comprehensive experiments to validate the proposed framework on a large-scale real-world urban driving dataset. The results reveal that the conditional prediction model is able to forecast multiple possible future trajectories given a candidate behavior and the prediction results are reactive to different plans. Moreover, the IRL-based scoring module can properly evaluate the trajectory proposals and select close-to-human ones. The proposed framework outperforms other baseline methods in terms of similarity to human driving trajectories. Moreover, we find that the conditional prediction model can improve both prediction and planning performance compared to the non-conditional model, and learning the scoring module is critical to correctly evaluating the candidate plans to align with human drivers.
Decision-making for urban autonomous driving is challenging due to the stochastic nature of interactive traffic participants and the complexity of road structures. Although reinforcement learning (RL)-based decision-making scheme is promising to handle urban driving scenarios, it suffers from low sample efficiency and poor adaptability. In this paper, we propose Scene-Rep Transformer to improve the RL decision-making capabilities with better scene representation encoding and sequential predictive latent distillation. Specifically, a multi-stage Transformer (MST) encoder is constructed to model not only the interaction awareness between the ego vehicle and its neighbors but also intention awareness between the agents and their candidate routes. A sequential latent Transformer (SLT) with self-supervised learning objectives is employed to distill the future predictive information into the latent scene representation, in order to reduce the exploration space and speed up training. The final decision-making module based on soft actor-critic (SAC) takes as input the refined latent scene representation from the Scene-Rep Transformer and outputs driving actions. The framework is validated in five challenging simulated urban scenarios with dense traffic, and its performance is manifested quantitatively by the substantial improvements in data efficiency and performance in terms of success rate, safety, and efficiency. The qualitative results reveal that our framework is able to extract the intentions of neighbor agents to help make decisions and deliver more diversified driving behaviors.
Making an accurate prediction of occupancy and flow is essential to enable better safety and interaction for autonomous vehicles under complex traffic scenarios. This work proposes STrajNet: a multi-modal Swin Transformerbased framework for effective scene occupancy and flow predictions. We employ Swin Transformer to encode the image and interaction-aware motion representations and propose a cross-attention module to inject motion awareness into grid cells across different time steps. Flow and occupancy predictions are then decoded through temporalsharing Pyramid decoders. The proposed method shows competitive prediction accuracy and other evaluation metrics in the Waymo Open Dataset benchmark.
Predicting the future states of surrounding traffic participants and planning a safe, smooth, and socially compliant trajectory accordingly is crucial for autonomous vehicles. There are two major issues with the current autonomous driving system: the prediction module is often decoupled from the planning module and the cost function for planning is hard to specify and tune. To tackle these issues, we propose an end-to-end differentiable framework that integrates prediction and planning modules and is able to learn the cost function from data. Specifically, we employ a differentiable nonlinear optimizer as the motion planner, which takes the predicted trajectories of surrounding agents given by the neural network as input and optimizes the trajectory for the autonomous vehicle, thus enabling all operations in the framework to be differentiable including the cost function weights. The proposed framework is trained on a large-scale real-world driving dataset to imitate human driving trajectories in the entire driving scene and validated in both open-loop and closed-loop manners. The open-loop testing results reveal that the proposed method outperforms the baseline methods across a variety of metrics and delivers planning-centric prediction results, allowing the planning module to output close-to-human trajectories. In closed-loop testing, the proposed method shows the ability to handle complex urban driving scenarios and robustness against the distributional shift that imitation learning methods suffer from. Importantly, we find that joint training of planning and prediction modules achieves better performance than planning with a separate trained prediction module in both open-loop and closed-loop tests. Moreover, the ablation study indicates that the learnable components in the framework are essential to ensure planning stability and performance.
Crowdsourcing has emerged as a popular approach for collecting annotated data to train supervised machine learning models. However, annotator bias can lead to defective annotations. Though there are a few works investigating individual annotator bias, the group effects in annotators are largely overlooked. In this work, we reveal that annotators within the same demographic group tend to show consistent group bias in annotation tasks and thus we conduct an initial study on annotator group bias. We first empirically verify the existence of annotator group bias in various real-world crowdsourcing datasets. Then, we develop a novel probabilistic graphical framework GroupAnno to capture annotator group bias with a new extended Expectation Maximization (EM) training algorithm. We conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the effectiveness of our model in modeling annotator group bias in label aggregation and model learning over competitive baselines.