We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts. We adapt the information-theoretic perspective of Russo and Van Roy [2016] to the contextual setting by introducing a new concept of information ratio based on the mutual information between the unknown model parameter and the observed loss. This allows us to bound the regret in terms of the entropy of the prior distribution through a remarkably simple proof, and with no structural assumptions on the likelihood or the prior. The extension to priors with infinite entropy only requires a Lipschitz assumption on the log-likelihood. An interesting special case is that of logistic bandits with d-dimensional parameters, K actions, and Lipschitz logits, for which we provide a $\widetilde{O}(\sqrt{dKT})$ regret upper-bound that does not depend on the smallest slope of the sigmoid link function.
Recently, prompt-based learning has become a very popular solution in many Natural Language Processing (NLP) tasks by inserting a template into model input, which converts the task into a cloze-style one to smoothing out differences between the Pre-trained Language Model (PLM) and the current task. But in the case of relation classification, it is difficult to map the masked output to the relation labels because of its abundant semantic information, e.g. org:founded_by''. Therefore, a pre-trained model still needs enough labelled data to fit the relations. To mitigate this challenge, in this paper, we present a novel prompt-based learning method, namely LabelPrompt, for the relation classification task. It is an extraordinary intuitive approach by a motivation: ``GIVE MODEL CHOICES!''. First, we define some additional tokens to represent the relation labels, which regards these tokens as the verbalizer with semantic initialisation and constructs them with a prompt template method. Then we revisit the inconsistency of the predicted relation and the given entities, an entity-aware module with the thought of contrastive learning is designed to mitigate the problem. At last, we apply an attention query strategy to self-attention layers to resolve two types of tokens, prompt tokens and sequence tokens. The proposed strategy effectively improves the adaptation capability of prompt-based learning in the relation classification task when only a small labelled data is available. Extensive experimental results obtained on several bench-marking datasets demonstrate the superiority of the proposed LabelPrompt method, particularly in the few-shot scenario.
In this paper, we propose a self-supervised twin network approach based on this a priori. The method of generating the approximate10 edge information of an image and then differentially eliminating the edge errors11 in the reconstructed image with a dilate algorithm. This is used to improve the12 accuracy of the reconstructed image and to separate foreign matter and noise from13 the original image, so that it can be visualized in a more practical scene
In this work, we consider the problem of multi-step channel prediction in wireless communication systems. In existing works, autoregressive (AR) models are either replaced or combined with feed-forward neural networks(NNs) or, alternatively, with recurrent neural networks (RNNs). This paper explores the possibility of using sequence-to-sequence (Seq2Seq) and transformer neural network (TNN) models for channel state information (CSI) prediction. Simulation results show that both, Seq2Seq and TNNs, represent an appealing alternative to RNNs and feed-forward NNs in the context of CSI prediction. Additionally, the TNN with a few adaptations can extrapolate better than other models to CSI sequences that are either shorter or longer than the ones the model saw during training.
Large-scale embedding-based retrieval (EBR) is the cornerstone of search-related industrial applications. Given a user query, the system of EBR aims to identify relevant information from a large corpus of documents that may be tens or hundreds of billions in size. The storage and computation turn out to be expensive and inefficient with massive documents and high concurrent queries, making it difficult to further scale up. To tackle the challenge, we propose a binary embedding-based retrieval (BEBR) engine equipped with a recurrent binarization algorithm that enables customized bits per dimension. Specifically, we compress the full-precision query and document embeddings, formulated as float vectors in general, into a composition of multiple binary vectors using a lightweight transformation model with residual multilayer perception (MLP) blocks. We can therefore tailor the number of bits for different applications to trade off accuracy loss and cost savings. Importantly, we enable task-agnostic efficient training of the binarization model using a new embedding-to-embedding strategy. We also exploit the compatible training of binary embeddings so that the BEBR engine can support indexing among multiple embedding versions within a unified system. To further realize efficient search, we propose Symmetric Distance Calculation (SDC) to achieve lower response time than Hamming codes. We successfully employed the introduced BEBR to Tencent products, including Sogou, Tencent Video, QQ World, etc. The binarization algorithm can be seamlessly generalized to various tasks with multiple modalities. Extensive experiments on offline benchmarks and online A/B tests demonstrate the efficiency and effectiveness of our method, significantly saving 30%~50% index costs with almost no loss of accuracy at the system level.
Neural operators, which use deep neural networks to approximate the solution mappings of partial differential equation (PDE) systems, are emerging as a new paradigm for PDE simulation. The neural operators could be trained in supervised or unsupervised ways, i.e., by using the generated data or the PDE information. The unsupervised training approach is essential when data generation is costly or the data is less qualified (e.g., insufficient and noisy). However, its performance and efficiency have plenty of room for improvement. To this end, we design a new loss function based on the Feynman-Kac formula and call the developed neural operator Monte-Carlo Neural Operator (MCNO), which can allow larger temporal steps and efficiently handle fractional diffusion operators. Our analyses show that MCNO has advantages in handling complex spatial conditions and larger temporal steps compared with other unsupervised methods. Furthermore, MCNO is more robust with the perturbation raised by the numerical scheme and operator approximation. Numerical experiments on the diffusion equation and Navier-Stokes equation show significant accuracy improvement compared with other unsupervised baselines, especially for the vibrated initial condition and long-time simulation settings.
Physiological signals are high-dimensional time series of great practical values in medical and healthcare applications. However, previous works on its classification fail to obtain promising results due to the intractable data characteristics and the severe label sparsity issues. In this paper, we try to address these challenges by proposing a more effective and interpretable scheme tailored for the physiological signal classification task. Specifically, we exploit the time series shapelets to extract prominent local patterns and perform interpretable sequence discretization to distill the whole-series information. By doing so, the long and continuous raw signals are compressed into short and discrete token sequences, where both local patterns and global contexts are well preserved. Moreover, to alleviate the label sparsity issue, a multi-scale transformation strategy is adaptively designed to augment data and a cross-scale contrastive learning mechanism is accordingly devised to guide the model training. We name our method as ShapeWordNet and conduct extensive experiments on three real-world datasets to investigate its effectiveness. Comparative results show that our proposed scheme remarkably outperforms four categories of cutting-edge approaches. Visualization analysis further witnesses the good interpretability of the sequence discretization idea based on shapelets.
When the facial image is blurred, it has a great impact on high-level vision tasks such as face recognition. The purpose of facial image deblurring is to recover a clear image from a blurry input image, which can improve the recognition accuracy and so on. General deblurring methods can not perform well on facial images. So some face deblurring methods are proposed to improve the performance by adding semantic or structural information as specific priors according to the characteristics of facial images. This paper surveys and summarizes recently published methods for facial image deblurring, most of which are based on deep learning. Firstly, we give a brief introduction to the modeling of image blur. Next, we summarize face deblurring methods into two categories, namely model-based methods and deep learning-based methods. Furthermore, we summarize the datasets, loss functions, and performance evaluation metrics commonly used in the neural network training process. We show the performance of classical methods on these datasets and metrics and give a brief discussion on the differences of model-based and learning-based methods. Finally, we discuss current challenges and possible future research directions.
In this paper, we study the Tiered Reinforcement Learning setting, a parallel transfer learning framework, where the goal is to transfer knowledge from the low-tier (source) task to the high-tier (target) task to reduce the exploration risk of the latter while solving the two tasks in parallel. Unlike previous work, we do not assume the low-tier and high-tier tasks share the same dynamics or reward functions, and focus on robust knowledge transfer without prior knowledge on the task similarity. We identify a natural and necessary condition called the "Optimal Value Dominance" for our objective. Under this condition, we propose novel online learning algorithms such that, for the high-tier task, it can achieve constant regret on partial states depending on the task similarity and retain near-optimal regret when the two tasks are dissimilar, while for the low-tier task, it can keep near-optimal without making sacrifice. Moreover, we further study the setting with multiple low-tier tasks, and propose a novel transfer source selection mechanism, which can ensemble the information from all low-tier tasks and allow provable benefits on a much larger state-action space.
Calibrating deep learning models to yield uncertainty-aware predictions is crucial as deep neural networks get increasingly deployed in safety-critical applications. While existing post-hoc calibration methods achieve impressive results on in-domain test datasets, they are limited by their inability to yield reliable uncertainty estimates in domain-shift and out-of-domain (OOD) scenarios. We aim to bridge this gap by proposing DAC, an accuracy-preserving as well as Density-Aware Calibration method based on k-nearest-neighbors (KNN). In contrast to existing post-hoc methods, we utilize hidden layers of classifiers as a source for uncertainty-related information and study their importance. We show that DAC is a generic method that can readily be combined with state-of-the-art post-hoc methods. DAC boosts the robustness of calibration performance in domain-shift and OOD, while maintaining excellent in-domain predictive uncertainty estimates. We demonstrate that DAC leads to consistently better calibration across a large number of model architectures, datasets, and metrics. Additionally, we show that DAC improves calibration substantially on recent large-scale neural networks pre-trained on vast amounts of data.