Current research in iris recognition is moving towards enabling more relaxed acquisition conditions. This has effects on the quality of acquired images, with low resolution being a predominant issue. Here, we evaluate a super-resolution algorithm used to reconstruct iris images based on Eigen-transformation of local image patches. Each patch is reconstructed separately, allowing better quality of enhanced images by preserving local information. Contrast enhancement is used to improve the reconstruction quality, while matcher fusion has been adopted to improve iris recognition performance. We validate the system using a database of 1,872 near-infrared iris images. The presented approach is superior to bilinear or bicubic interpolation, especially at lower resolutions, and the fusion of the two systems pushes the EER to below 5% for down-sampling factors up to a image size of only 13x13.
Emotional Support Conversation (ESConv) aims to reduce help-seekers'emotional distress with the supportive strategy and response. It is essential for the supporter to select an appropriate strategy with the feedback of the help-seeker (e.g., emotion change during dialog turns, etc) in ESConv. However, previous methods mainly focus on the dialog history to select the strategy and ignore the help-seeker's feedback, leading to the wrong and user-irrelevant strategy prediction. In addition, these approaches only model the context-to-strategy flow and pay less attention to the strategy-to-context flow that can focus on the strategy-related context for generating the strategy-constrain response. In this paper, we propose a Feedback-Aware Double COntrolling Network (FADO) to make a strategy schedule and generate the supportive response. The core module in FADO consists of a dual-level feedback strategy selector and a double control reader. Specifically, the dual-level feedback strategy selector leverages the turn-level and conversation-level feedback to encourage or penalize strategies. The double control reader constructs the novel strategy-to-context flow for generating the strategy-constrain response. Furthermore, a strategy dictionary is designed to enrich the semantic information of the strategy and improve the quality of strategy-constrain response. Experimental results on ESConv show that the proposed FADO has achieved the state-of-the-art performance in terms of both strategy selection and response generation. Our code is available at https://github/after/reviewing.
Graph convolutional learning has led to many exciting discoveries in diverse areas. However, in some applications, traditional graphs are insufficient to capture the structure and intricacies of the data. In such scenarios, multigraphs arise naturally as discrete structures in which complex dynamics can be embedded. In this paper, we develop convolutional information processing on multigraphs and introduce convolutional multigraph neural networks (MGNNs). To capture the complex dynamics of information diffusion within and across each of the multigraph's classes of edges, we formalize a convolutional signal processing model, defining the notions of signals, filtering, and frequency representations on multigraphs. Leveraging this model, we develop a multigraph learning architecture, including a sampling procedure to reduce computational complexity. The introduced architecture is applied towards optimal wireless resource allocation and a hate speech localization task, offering improved performance over traditional graph neural networks.
Federated clustering is an unsupervised learning problem that arises in a number of practical applications, including personalized recommender and healthcare systems. With the adoption of recent laws ensuring the "right to be forgotten", the problem of machine unlearning for federated clustering methods has become of significant importance. This work proposes the first known unlearning mechanism for federated clustering with privacy criteria that support simple, provable, and efficient data removal at the client and server level. The gist of our approach is to combine special initialization procedures with quantization methods that allow for secure aggregation of estimated local cluster counts at the server unit. As part of our platform, we introduce secure compressed multiset aggregation (SCMA), which is of independent interest for secure sparse model aggregation. In order to simultaneously facilitate low communication complexity and secret sharing protocols, we integrate Reed-Solomon encoding with special evaluation points into the new SCMA pipeline and derive bounds on the time and communication complexity of different components of the scheme. Compared to completely retraining K-means++ locally and globally for each removal request, we obtain an average speed-up of roughly 84x across seven datasets, two of which contain biological and medical information that is subject to frequent unlearning requests.
We propose a Vision-Language Transformer (VLT) framework for referring segmentation to facilitate deep interactions among multi-modal information and enhance the holistic understanding to vision-language features. There are different ways to understand the dynamic emphasis of a language expression, especially when interacting with the image. However, the learned queries in existing transformer works are fixed after training, which cannot cope with the randomness and huge diversity of the language expressions. To address this issue, we propose a Query Generation Module, which dynamically produces multiple sets of input-specific queries to represent the diverse comprehensions of language expression. To find the best among these diverse comprehensions, so as to generate a better mask, we propose a Query Balance Module to selectively fuse the corresponding responses of the set of queries. Furthermore, to enhance the model's ability in dealing with diverse language expressions, we consider inter-sample learning to explicitly endow the model with knowledge of understanding different language expressions to the same object. We introduce masked contrastive learning to narrow down the features of different expressions for the same target object while distinguishing the features of different objects. The proposed approach is lightweight and achieves new state-of-the-art referring segmentation results consistently on five datasets.
A major challenge for matching-based depth estimation is to prevent mismatches in occlusion and smooth regions. An effective matching window satisfying three characteristics: texture richness, disparity consistency and anti-occlusion should be able to prevent mismatches to some extent. According to these characteristics, we propose matching entropy in the spatial domain of light field to measure the amount of correct information in a matching window, which provides the criterion for matching window selection. Based on matching entropy regularization, we establish an optimization model for depth estimation with a matching cost fidelity term. To find the optimum, we propose a two-step adaptive matching algorithm. First, the region type is adaptively determined to identify occluding, occluded, smooth and textured regions. Then, the matching entropy criterion is used to adaptively select the size and shape of matching windows, as well as the visible viewpoints. The two-step process can reduce mismatches and redundant calculations by selecting effective matching windows. The experimental results on synthetic and real data show that the proposed method can effectively improve the accuracy of depth estimation in occlusion and smooth regions and has strong robustness for different noise levels. Therefore, high-precision depth estimation from 4D light field data is achieved.
Fiber optic shape sensors have enabled unique advances in various navigation tasks, from medical tool tracking to industrial applications. Eccentric fiber Bragg gratings (FBG) are cheap and easy-to-fabricate shape sensors that are often interrogated with simple setups. However, using low-cost interrogation systems for such intensity-based quasi-distributed sensors introduces further complications to the sensor's signal. Therefore, eccentric FBGs have not been able to accurately estimate complex multi-bend shapes. Here, we present a novel technique to overcome these limitations and provide accurate and precise shape estimation in eccentric FBG sensors. We investigate the most important bending-induced effects in curved optical fibers that are usually eliminated in intensity-based fiber sensors. These effects contain shape deformation information with a higher spatial resolution that we are now able to extract using deep learning techniques. We design a deep learning model based on a convolutional neural network that is trained to predict shapes given the sensor's spectra. We also provide a visual explanation, highlighting wavelength elements whose intensities are more relevant in making shape predictions. These findings imply that deep learning techniques benefit from the bending-induced effects that impact the desired signal in a complex manner. This is the first step toward cheap yet accurate fiber shape sensing solutions.
The traditional centralized baseband processing architecture is faced with the bottlenecks of high computation complexity and excessive fronthaul communication, especially when the number of antennas at the base station (BS) is large. To cope with these two challenges, the decentralized baseband processing (DPB) architecture has been proposed, where the BS antennas are partitioned into multiple clusters, and each is connected to a local baseband unit (BBU). In this paper, we are interested in the low-complexity distributed channel estimation (CE) method under such DBP architecture, which is rarely studied in the literature. The aim is to devise distributed CE algorithms that can perform as well as the centralized scheme but with a small inter-BBU communication cost. Specifically, based on the low-complexity diagonal minimum mean square error channel estimator, we propose two distributed CE algorithms, namely the aggregate-then-estimate algorithm and the estimate-then-aggregate algorithm. In contrast to the existing distributed CE algorithm which requires iterative information exchanges among the nodes, our algorithms only require one roundtrip communication among BBUs. Extensive experiment results are presented to demonstrate the advantages of the proposed distributed CE algorithms in terms of estimation accuracy, inter-BBU communication cost, and computation complexity.
As a fine-grained and local expression behavior measurement, facial action unit (FAU) analysis (e.g., detection and intensity estimation) has been documented for its time-consuming, labor-intensive, and error-prone annotation. Thus a long-standing challenge of FAU analysis arises from the data scarcity of manual annotations, limiting the generalization ability of trained models to a large extent. Amounts of previous works have made efforts to alleviate this issue via semi/weakly supervised methods and extra auxiliary information. However, these methods still require domain knowledge and have not yet avoided the high dependency on data annotation. This paper introduces a robust facial representation model MAE-Face for AU analysis. Using masked autoencoding as the self-supervised pre-training approach, MAE-Face first learns a high-capacity model from a feasible collection of face images without additional data annotations. Then after being fine-tuned on AU datasets, MAE-Face exhibits convincing performance for both AU detection and AU intensity estimation, achieving a new state-of-the-art on nearly all the evaluation results. Further investigation shows that MAE-Face achieves decent performance even when fine-tuned on only 1\% of the AU training set, strongly proving its robustness and generalization performance.
The deep CNNs in image semantic segmentation typically require a large number of densely-annotated images for training and have difficulties in generalizing to unseen object categories. Therefore, few-shot segmentation has been developed to perform segmentation with just a few annotated examples. In this work, we tackle the few-shot segmentation using a self-regularized prototypical network (SRPNet) based on prototype extraction for better utilization of the support information. The proposed SRPNet extracts class-specific prototype representations from support images and generates segmentation masks for query images by a distance metric - the fidelity. A direct yet effective prototype regularization on support set is proposed in SRPNet, in which the generated prototypes are evaluated and regularized on the support set itself. The extent to which the generated prototypes restore the support mask imposes an upper limit on performance. The performance on the query set should never exceed the upper limit no matter how complete the knowledge is generalized from support set to query set. With the specific prototype regularization, SRPNet fully exploits knowledge from the support and offers high-quality prototypes that are representative for each semantic class and meanwhile discriminative for different classes. The query performance is further improved by an iterative query inference (IQI) module that combines a set of regularized prototypes. Our proposed SRPNet achieves new state-of-art performance on 1-shot and 5-shot segmentation benchmarks.