In this paper, we propose a new dense retrieval model which learns diverse document representations with deep query interactions. Our model encodes each document with a set of generated pseudo-queries to get query-informed, multi-view document representations. It not only enjoys high inference efficiency like the vanilla dual-encoder models, but also enables deep query-document interactions in document encoding and provides multi-faceted representations to better match different queries. Experiments on several benchmarks demonstrate the effectiveness of the proposed method, out-performing strong dual encoder baselines.The code is available at \url{https://github.com/jordane95/dual-cross-encoder
Few-shot learning (FSL) aims to recognize novel queries with only a few support samples through leveraging prior knowledge from a base dataset. In this paper, we consider the domain shift problem in FSL and aim to address the domain gap between the support set and the query set. Different from previous cross-domain FSL work (CD-FSL) that considers the domain shift between base and novel classes, the new problem, termed cross-domain cross-set FSL (CDSC-FSL), requires few-shot learners not only to adapt to the new domain, but also to be consistent between different domains within each novel class. To this end, we propose a novel approach, namely stabPA, to learn prototypical compact and cross-domain aligned representations, so that the domain shift and few-shot learning can be addressed simultaneously. We evaluate our approach on two new CDCS-FSL benchmarks built from the DomainNet and Office-Home datasets respectively. Remarkably, our approach outperforms multiple elaborated baselines by a large margin, e.g., improving 5-shot accuracy by 6.0 points on average on DomainNet. Code is available at https://github.com/WentaoChen0813/CDCS-FSL
Joint Super-Resolution and Inverse Tone-Mapping (joint SR-ITM) aims to increase the resolution and dynamic range of low-resolution and standard dynamic range images.Recent methods mainly resort to image decomposition techniques with the multi-branch network architecture.However, the rigid decomposition employed by these methods largely restricts their power on diverse images.To exploit its potential power, in this paper, we generalize the decomposition mechanism from the image domain to the broader feature domain. To this end, we propose a lightweight Feature Decomposition Aggregation Network (FDAN). In particular, we design a Feature Decomposition Block (FDB), which can achieve learnable separation of feature details and contrasts.By cascading FDBs, we can build up a Hierarchical Feature Decomposition Group for powerful multi-level feature decomposition.Moreover, we collect a new benchmark dataset for joint SR-ITM, \ie, SRITM-4K, which is large-scale and provides versatile scenarios for sufficient model training and evaluation.Experimental results on two benchmark datasets demonstrate that our FDAN is efficient and outperforms previous methods on joint SR-ITM.Our code and dataset will be publicly released.
In this paper, we propose SimLM (Similarity matching with Language Model pre-training), a simple yet effective pre-training method for dense passage retrieval. It employs a simple bottleneck architecture that learns to compress the passage information into a dense vector through self-supervised pre-training. We use a replaced language modeling objective, which is inspired by ELECTRA, to improve the sample efficiency and reduce the mismatch of the input distribution between pre-training and fine-tuning. SimLM only requires access to unlabeled corpus, and is more broadly applicable when there are no labeled data or queries. We conduct experiments on several large-scale passage retrieval datasets, and show substantial improvements over strong baselines under various settings. Remarkably, SimLM even outperforms multi-vector approaches such as ColBERTv2 which incurs significantly more storage cost.
Click-through rate (CTR) prediction is a fundamental technique in recommendation and advertising systems. Recent studies have proved that learning a unified model to serve multiple domains is effective to improve the overall performance. However, it is still challenging to improve generalization across domains under limited training data, and hard to deploy current solutions due to their computational complexity. In this paper, we propose a simple yet effective framework AdaSparse for multi-domain CTR prediction, which learns adaptively sparse structure for each domain, achieving better generalization across domains with lower computational cost. In AdaSparse, we introduce domain-aware neuron-level weighting factors to measure the importance of neurons, with that for each domain our model can prune redundant neurons to improve generalization. We further add flexible sparsity regularizations to control the sparsity ratio of learned structures. Offline and online experiments show that AdaSparse outperforms previous multi-domain CTR models significantly.
Gait recognition aims at identifying a person at a distance through visual cameras. With the emergence of deep learning, significant advancements in gait recognition have achieved inspiring success in many scenarios by utilizing deep learning techniques. Nevertheless, the increasing need for video surveillance introduces more challenges, including robust recognition under various variances, modeling motion information in gait sequences, unfair performance comparison due to protocol variances, biometrics security, and privacy prevention. This paper provides a comprehensive survey of deep learning for gait recognition. We first present the odyssey of gait recognition from traditional algorithms to deep models, providing explicit knowledge of the whole workflow of a gait recognition system. Then deep learning for gait recognition is discussed from the perspective of deep representations and architecture with an in-depth summary. Specifically, deep gait representations are categorized into static and dynamic features, while deep architectures include single-stream and multi-stream architecture. Following our proposed taxonomy with novelty, it can be beneficial for providing inspiration and promoting the perception of deep gait recognition. Besides, we also present a comprehensive summary of all vision-based gait datasets and the performance analysis. Finally, the article discusses some open issues with significant potential prospects.
This report presents the methods of the winning entry of the RxR-Habitat Competition in CVPR 2022. The competition addresses the problem of Vision-and-Language Navigation in Continuous Environments (VLN-CE), which requires an agent to follow step-by-step natural language instructions to reach a target. We present a modular plan-and-control approach for the task. Our model consists of three modules: the candidate waypoints predictor (CWP), the history enhanced planner and the tryout controller. In each decision loop, CWP first predicts a set of candidate waypoints based on depth observations from multiple views. It can reduce the complexity of the action space and facilitate planning. Then, a history-enhanced planner is adopted to select one of the candidate waypoints as the subgoal. The planner additionally encodes historical memory to track the navigation progress, which is especially effective for long-horizon navigation. Finally, we propose a non-parametric heuristic controller named tryout to execute low-level actions to reach the planned subgoal. It is based on the trial-and-error mechanism which can help the agent to avoid obstacles and escape from getting stuck. All three modules work hierarchically until the agent stops. We further take several recent advances of Vision-and-Language Navigation (VLN) to improve the performance such as pretraining based on large-scale synthetic in-domain dataset, environment-level data augmentation and snapshot model ensemble. Our model won the RxR-Habitat Competition 2022, with 48% and 90% relative improvements over existing methods on NDTW and SR metrics respectively.
Real-Time Bidding (RTB) is an important mechanism in modern online advertising systems. Advertisers employ bidding strategies in RTB to optimize their advertising effects subject to various financial requirements, especially the return-on-investment (ROI) constraint. ROIs change non-monotonically during the sequential bidding process, and often induce a see-saw effect between constraint satisfaction and objective optimization. While some existing approaches show promising results in static or mildly changing ad markets, they fail to generalize to highly dynamic ad markets with ROI constraints, due to their inability to adaptively balance constraints and objectives amidst non-stationarity and partial observability. In this work, we specialize in ROI-Constrained Bidding in non-stationary markets. Based on a Partially Observable Constrained Markov Decision Process, our method exploits an indicator-augmented reward function free of extra trade-off parameters and develops a Curriculum-Guided Bayesian Reinforcement Learning (CBRL) framework to adaptively control the constraint-objective trade-off in non-stationary ad markets. Extensive experiments on a large-scale industrial dataset with two problem settings reveal that CBRL generalizes well in both in-distribution and out-of-distribution data regimes, and enjoys superior learning efficiency and stability.