The imitation learning of self-driving vehicle policies through behavioral cloning is often carried out in an open-loop fashion, ignoring the effect of actions to future states. Training such policies purely with Empirical Risk Minimization (ERM) can be detrimental to real-world performance, as it biases policy networks towards matching only open-loop behavior, showing poor results when evaluated in closed-loop. In this work, we develop an efficient and simple-to-implement principle called Closed-loop Weighted Empirical Risk Minimization (CW-ERM), in which a closed-loop evaluation procedure is first used to identify training data samples that are important for practical driving performance and then we these samples to help debias the policy network. We evaluate CW-ERM in a challenging urban driving dataset and show that this procedure yields a significant reduction in collisions as well as other non-differentiable closed-loop metrics.
Despite the strong performance of current NLP models, they can be brittle against adversarial attacks. To enable effective learning against adversarial inputs, we introduce the use of rationale models that can explicitly learn to ignore attack tokens. We find that the rationale models can successfully ignore over 90\% of attack tokens. This approach leads to consistent sizable improvements ($\sim$10\%) over baseline models in robustness on three datasets for both BERT and RoBERTa, and also reliably outperforms data augmentation with adversarial examples alone. In many cases, we find that our method is able to close the gap between model performance on a clean test set and an attacked test set and hence reduce the effect of adversarial attacks.
Audio captioning aims at describing the content of audio clips with human language. Due to the ambiguity of audio, different people may perceive the same audio differently, resulting in caption disparities (i.e., one audio may correlate to several captions with diverse semantics). For that, general audio captioning models achieve the one-to-many training by randomly selecting a correlated caption as the ground truth for each audio. However, it leads to a significant variation in the optimization directions and weakens the model stability. To eliminate this negative effect, in this paper, we propose a two-stage framework for audio captioning: (i) in the first stage, via the contrastive learning, we construct a proxy feature space to reduce the distances between captions correlated to the same audio, and (ii) in the second stage, the proxy feature space is utilized as additional supervision to encourage the model to be optimized in the direction that benefits all the correlated captions. We conducted extensive experiments on two datasets using four commonly used encoder and decoder architectures. Experimental results demonstrate the effectiveness of the proposed method. The code is available at https://github.com/PRIS-CV/Caption-Feature-Space-Regularization.
Electro-optic (EO) modulation is a well-known and essential topic in the field of communications and sensing. Its ultrahigh efficiency is unprecedentedly desired in the current green and data era. However, dramatically increasing the modulation efficiency is difficult due to the monotonic mapping relationship between the electrical signal and modulated optical signal. Here, a new mechanism termed phase-transition EO modulation is revealed from the reciprocal transition between two distinct phase planes arising from the bifurcation. Remarkably, a monolithically integrated mode-locked laser (MLL) is implemented as a prototype. A 24.8-GHz radio-frequency signal is generated and modulated, achieving a modulation energy efficiency of 3.06 fJ/bit improved by about four orders of magnitude and a contrast ratio exceeding 50 dB. Thus, MLL-based phase-transition EO modulation is characterised by ultrahigh modulation efficiency and ultrahigh contrast ratio, as experimentally proved in radio-over-fibre and underwater acoustic-sensing systems. This phase-transition EO modulation opens a new avenue for green communication and ubiquitous connections.
The aspect-based sentiment analysis (ABSA) is a fine-grained task that aims to determine the sentiment polarity towards targeted aspect terms occurring in the sentence. The development of the ABSA task is very much hindered by the lack of annotated data. To tackle this, the prior works have studied the possibility of utilizing the sentiment analysis (SA) datasets to assist in training the ABSA model, primarily via pretraining or multi-task learning. In this article, we follow this line, and for the first time, we manage to apply the Pseudo-Label (PL) method to merge the two homogeneous tasks. While it seems straightforward to use generated pseudo labels to handle this case of label granularity unification for two highly related tasks, we identify its major challenge in this paper and propose a novel framework, dubbed as Dual-granularity Pseudo Labeling (DPL). Further, similar to PL, we regard the DPL as a general framework capable of combining other prior methods in the literature. Through extensive experiments, DPL has achieved state-of-the-art performance on standard benchmarks surpassing the prior work significantly.
Session-based recommendation (SBR) is proposed to recommend items within short sessions given that user profiles are invisible in various scenarios nowadays, such as e-commerce and short video recommendation. There is a common scenario that user specifies a target category of items as a global filter, however previous SBR settings mainly consider the item sequence and overlook the rich target category information in this scenario. Therefore, we define a new task called Category-aware Session-Based Recommendation (CSBR), focusing on the above scenario, in which the user-specified category can be efficiently utilized by the recommendation system. To address the challenges of the proposed task, we develop a novel method called Intention Adaptive Graph Neural Network (IAGNN), which takes advantage of relationship between items and their categories to achieve an accurate recommendation result. Specifically, we construct a category-aware graph with both item and category nodes to represent the complex transition information in the session. An intention-adaptive graph neural network on the category-aware graph is utilized to capture user intention by transferring the historical interaction information to the user-specified category domain. Extensive experiments on three real-world datasets are conducted to show our IAGNN outperforms the state-of-the-art baselines in the new task.
Session-based recommendation (SBR) is a challenging task, which aims at recommending next items based on anonymous interaction sequences. Despite the superior performance of existing methods for SBR, there are still several limitations: (i) Almost all existing works concentrate on single interest extraction and fail to disentangle multiple interests of user, which easily results in suboptimal representations for SBR. (ii) Furthermore, previous methods also ignore the multi-form temporal information, which is significant signal to obtain current intention for SBR. To address the limitations mentioned above, we propose a novel method, called \emph{Temporal aware Multi-Interest Graph Neural Network} (TMI-GNN) to disentangle multi-interest and yield refined intention representations with the injection of two level temporal information. Specifically, by appending multiple interest nodes, we construct a multi-interest graph for current session, and adopt the GNNs to model the item-item relation to capture adjacent item transitions, item-interest relation to disentangle the multi-interests, and interest-item relation to refine the item representation. Meanwhile, we incorporate item-level time interval signals to guide the item information propagation, and interest-level time distribution information to assist the scattering of interest information. Experiments on three benchmark datasets demonstrate that TMI-GNN outperforms other state-of-the-art methods consistently.
ByteScheduler partitions and rearranges tensor transmissions to improve the communication efficiency of distributed Deep Neural Network (DNN) training. The configuration of hyper-parameters (i.e., the partition size and the credit size) is critical to the effectiveness of partitioning and rearrangement. Currently, ByteScheduler adopts Bayesian Optimization (BO) to find the optimal configuration for the hyper-parameters beforehand. In practice, however, various runtime factors (e.g., worker node status and network conditions) change over time, making the statically-determined one-shot configuration result suboptimal for real-world DNN training. To address this problem, we present a real-time configuration method (called AutoByte) that automatically and timely searches the optimal hyper-parameters as the training systems dynamically change. AutoByte extends the ByteScheduler framework with a meta-network, which takes the system's runtime statistics as its input and outputs predictions for speedups under specific configurations. Evaluation results on various DNN models show that AutoByte can dynamically tune the hyper-parameters with low resource usage, and deliver up to 33.2\% higher performance than the best static configuration in ByteScheduler.
Conversational recommendation system (CRS) is able to obtain fine-grained and dynamic user preferences based on interactive dialogue. Previous CRS assumes that the user has a clear target item. However, for many users who resort to CRS, they might not have a clear idea about what they really like. Specifically, the user may have a clear single preference for some attribute types (e.g. color) of items, while for other attribute types, the user may have multiple preferences or even no clear preferences, which leads to multiple acceptable attribute instances (e.g. black and red) of one attribute type. Therefore, the users could show their preferences over items under multiple combinations of attribute instances rather than a single item with unique combination of all attribute instances. As a result, we first propose a more realistic CRS learning setting, namely Multi-Interest Multi-round Conversational Recommendation, where users may have multiple interests in attribute instance combinations and accept multiple items with partially overlapped combinations of attribute instances. To effectively cope with the new CRS learning setting, in this paper, we propose a novel learning framework namely, Multi-Choice questions based Multi-Interest Policy Learning . In order to obtain user preferences more efficiently, the agent generates multi-choice questions rather than binary yes/no ones on specific attribute instance. Besides, we propose a union set strategy to select candidate items instead of existing intersection set strategy in order to overcome over-filtering items during the conversation. Finally, we design a Multi-Interest Policy Learning module, which utilizes captured multiple interests of the user to decide next action, either asking attribute instances or recommending items. Extensive experimental results on four datasets verify the superiority of our method for the proposed setting.
Recently, Truncated Quantile Critics (TQC), using distributional representation of critics, was shown to provide state-of-the-art asymptotic training performance on all environments from the MuJoCo continuous control benchmark suite. Also recently, Randomized Ensemble Double Q-Learning (REDQ), using a high update-to-data ratio and target randomization, was shown to achieve high sample efficiency that is competitive with state-of-the-art model-based methods. In this paper, we propose a novel model-free algorithm, Aggressive Q-Learning with Ensembles (AQE), which improves the sample-efficiency performance of REDQ and the asymptotic performance of TQC, thereby providing overall state-of-the-art performance during all stages of training. Moreover, AQE is very simple, requiring neither distributional representation of critics nor target randomization.