In this paper, we investigate the generalization problem of person re-identification (re-id), whose major challenge is the distribution shift on an unseen domain. As an important tool of regularizing the distribution, batch normalization (BN) has been widely used in existing methods. However, they neglect that BN is severely biased to the training domain and inevitably suffers the performance drop if directly generalized without being updated. To tackle this issue, we propose Batch Norm Test-time Adaption (BNTA), a novel re-id framework that applies the self-supervised strategy to update BN parameters adaptively. Specifically, BNTA quickly explores the domain-aware information within unlabeled target data before inference, and accordingly modulates the feature distribution normalized by BN to adapt to the target domain. This is accomplished by two designed self-supervised auxiliary tasks, namely part positioning and part nearest neighbor matching, which help the model mine the domain-aware information with respect to the structure and identity of body parts, respectively. To demonstrate the effectiveness of our method, we conduct extensive experiments on three re-id datasets and confirm the superior performance to the state-of-the-art methods.
The prevalence and perniciousness of fake news has been a critical issue on the Internet, which stimulates the development of automatic fake news detection in turn. In this paper, we focus on the evidence-based fake news detection, where several evidences are utilized to probe the veracity of news (i.e., a claim). Most previous methods first employ sequential models to embed the semantic information and then capture the claim-evidence interaction based on different attention mechanisms. Despite their effectiveness, they still suffer from two main weaknesses. Firstly, due to the inherent drawbacks of sequential models, they fail to integrate the relevant information that is scattered far apart in evidences for veracity checking. Secondly, they neglect much redundant information contained in evidences that may be useless or even harmful. To solve these problems, we propose a unified Graph-based sEmantic sTructure mining framework, namely GET in short. Specifically, different from the existing work that treats claims and evidences as sequences, we model them as graph-structured data and capture the long-distance semantic dependency among dispersed relevant snippets via neighborhood propagation. After obtaining contextual semantic information, our model reduces information redundancy by performing graph structure learning. Finally, the fine-grained semantic representations are fed into the downstream claim-evidence interaction module for predictions. Comprehensive experiments have demonstrated the superiority of GET over the state-of-the-arts.
In computer vision, some attribution methods for explaining CNNs attempt to study how the intermediate features affect the network prediction. However, they usually ignore the feature hierarchies among the intermediate features. This paper introduces a hierarchical decomposition framework to explain CNN's decision-making process in a top-down manner. Specifically, we propose a gradient-based activation propagation (gAP) module that can decompose any intermediate CNN decision to its lower layers and find the supporting features. Then we utilize the gAP module to iteratively decompose the network decision to the supporting evidence from different CNN layers. The proposed framework can generate a deep hierarchy of strongly associated supporting evidence for the network decision, which provides insight into the decision-making process. Moreover, gAP is effort-free for understanding CNN-based models without network architecture modification and extra training process. Experiments show the effectiveness of the proposed method. The code and interactive demo website will be made publicly available.
In online advertising, conventional post-click conversion rate (CVR) estimation models are trained using clicked samples. However, during online serving the models need to estimate for all impression ads, leading to the sample selection bias (SSB) issue. Intuitively, providing reliable supervision signals for unclicked ads is a feasible way to alleviate the SSB issue. This paper proposes an uncertainty-regularized knowledge distillation (UKD) framework to debias CVR estimation via distilling knowledge from unclicked ads. A teacher model learns click-adaptive representations and produces pseudo-conversion labels on unclicked ads as supervision signals. Then a student model is trained on both clicked and unclicked ads with knowledge distillation, performing uncertainty modeling to alleviate the inherent noise in pseudo-labels. Experiments on billion-scale datasets show that UKD outperforms previous debiasing methods. Online results verify that UKD achieves significant improvements.
With the fast-growing number of classification models being produced every day, numerous model interpretation and comparison solutions have also been introduced. For example, LIME and SHAP can interpret what input features contribute more to a classifier's output predictions. Different numerical metrics (e.g., accuracy) can be used to easily compare two classifiers. However, few works can interpret the contribution of a data feature to a classifier in comparison with its contribution to another classifier. This comparative interpretation can help to disclose the fundamental difference between two classifiers, select classifiers in different feature conditions, and better ensemble two classifiers. To accomplish it, we propose a learning-from-disagreement (LFD) framework to visually compare two classification models. Specifically, LFD identifies data instances with disagreed predictions from two compared classifiers and trains a discriminator to learn from the disagreed instances. As the two classifiers' training features may not be available, we train the discriminator through a set of meta-features proposed based on certain hypotheses of the classifiers to probe their behaviors. Interpreting the trained discriminator with the SHAP values of different meta-features, we provide actionable insights into the compared classifiers. Also, we introduce multiple metrics to profile the importance of meta-features from different perspectives. With these metrics, one can easily identify meta-features with the most complementary behaviors in two classifiers, and use them to better ensemble the classifiers. We focus on binary classification models in the financial services and advertising industry to demonstrate the efficacy of our proposed framework and visualizations.
The prevalence and perniciousness of fake news has been a critical issue on the Internet, which stimulates the development of automatic fake news detection in turn. In this paper, we focus on the evidence-based fake news detection, where several evidences are utilized to probe the veracity of news (i.e., a claim). Most previous methods first employ sequential models to embed the semantic information and then capture the claim-evidence interaction based on different attention mechanisms. Despite their effectiveness, they still suffer from two main weaknesses. Firstly, due to the inherent drawbacks of sequential models, they fail to integrate the relevant information that is scattered far apart in evidences for veracity checking. Secondly, they neglect much redundant information contained in evidences that may be useless or even harmful. To solve these problems, we propose a unified Graph-based sEmantic sTructure mining framework, namely GET in short. Specifically, different from the existing work that treats claims and evidences as sequences, we model them as graph-structured data and capture the long-distance semantic dependency among dispersed relevant snippets via neighborhood propagation. After obtaining contextual semantic information, our model reduces information redundancy by performing graph structure learning. Finally, the fine-grained semantic representations are fed into the downstream claim-evidence interaction module for predictions. Comprehensive experiments have demonstrated the superiority of GET over the state-of-the-arts.
Due to the constraints of the imaging device and high cost in operation time, computer tomography (CT) scans are usually acquired with low intra-slice resolution. Improving the intra-slice resolution is beneficial to the disease diagnosis for both human experts and computer-aided systems. To this end, this paper builds a novel medical slice synthesis to increase the between-slice resolution. Considering that the ground-truth intermediate medical slices are always absent in clinical practice, we introduce the incremental cross-view mutual distillation strategy to accomplish this task in the self-supervised learning manner. Specifically, we model this problem from three different views: slice-wise interpolation from axial view and pixel-wise interpolation from coronal and sagittal views. Under this circumstance, the models learned from different views can distill valuable knowledge to guide the learning processes of each other. We can repeat this process to make the models synthesize intermediate slice data with increasing inter-slice resolution. To demonstrate the effectiveness of the proposed approach, we conduct comprehensive experiments on a large-scale CT dataset. Quantitative and qualitative comparison results show that our method outperforms state-of-the-art algorithms by clear margins.
Knowledge Graphs (KGs), representing facts as triples, have been widely adopted in many applications. Reasoning tasks such as link prediction and rule induction are important for the development of KGs. Knowledge Graph Embeddings (KGEs) embedding entities and relations of a KG into continuous vector spaces, have been proposed for these reasoning tasks and proven to be efficient and robust. But the plausibility and feasibility of applying and deploying KGEs in real-work applications has not been well-explored. In this paper, we discuss and report our experiences of deploying KGEs in a real domain application: e-commerce. We first identity three important desiderata for e-commerce KG systems: 1) attentive reasoning, reasoning over a few target relations of more concerns instead of all; 2) explanation, providing explanations for a prediction to help both users and business operators understand why the prediction is made; 3) transferable rules, generating reusable rules to accelerate the deployment of a KG to new systems. While non existing KGE could meet all these desiderata, we propose a novel one, an explainable knowledge graph attention network that make prediction through modeling correlations between triples rather than purely relying on its head entity, relation and tail entity embeddings. It could automatically selects attentive triples for prediction and records the contribution of them at the same time, from which explanations could be easily provided and transferable rules could be efficiently produced. We empirically show that our method is capable of meeting all three desiderata in our e-commerce application and outperform typical baselines on datasets from real domain applications.
We study the classification of animal behavior using accelerometry data through various recurrent neural network (RNN) models. We evaluate the classification performance and complexity of the considered models, which feature long short-time memory (LSTM) or gated recurrent unit (GRU) architectures with varying depths and widths, using four datasets acquired from cattle via collar or ear tags. We also include two state-of-the-art convolutional neural network (CNN)-based time-series classification models in the evaluations. The results show that the RNN-based models can achieve similar or higher classification accuracy compared with the CNN-based models while having less computational and memory requirements. We also observe that the models with GRU architecture generally outperform the ones with LSTM architecture in terms of classification accuracy despite being less complex. A single-layer uni-directional GRU model with 64 hidden units appears to offer a good balance between accuracy and complexity making it suitable for implementation on edge/embedded devices.
We develop an end-to-end deep-neural-network-based algorithm for classifying animal behavior using accelerometry data on the embedded system of an artificial intelligence of things (AIoT) device installed in a wearable collar tag. The proposed algorithm jointly performs feature extraction and classification utilizing a set of infinite-impulse-response (IIR) and finite-impulse-response (FIR) filters together with a multilayer perceptron. The utilized IIR and FIR filters can be viewed as specific types of recurrent and convolutional neural network layers, respectively. We evaluate the performance of the proposed algorithm via two real-world datasets collected from grazing cattle. The results show that the proposed algorithm offers good intra- and inter-dataset classification accuracy and outperforms its closest contenders including two state-of-the-art convolutional-neural-network-based time-series classification algorithms, which are significantly more complex. We implement the proposed algorithm on the embedded system of the collar tag's AIoT device to perform in-situ classification of animal behavior. We achieve real-time in-situ behavior inference from accelerometry data without imposing any strain on the available computational, memory, or energy resources of the embedded system.