Multi-source unsupervised domain adaptation (MS-UDA) for sentiment analysis (SA) aims to leverage useful information in multiple source domains to help do SA in an unlabeled target domain that has no supervised information. Existing algorithms of MS-UDA either only exploit the shared features, i.e., the domain-invariant information, or based on some weak assumption in NLP, e.g., smoothness assumption. To avoid these problems, we propose two transfer learning frameworks based on the multi-source domain adaptation methodology for SA by combining the source hypotheses to derive a good target hypothesis. The key feature of the first framework is a novel Weighting Scheme based Unsupervised Domain Adaptation framework (WS-UDA), which combine the source classifiers to acquire pseudo labels for target instances directly. While the second framework is a Two-Stage Training based Unsupervised Domain Adaptation framework (2ST-UDA), which further exploits these pseudo labels to train a target private extractor. Importantly, the weights assigned to each source classifier are based on the relations between target instances and source domains, which measured by a discriminator through the adversarial training. Furthermore, through the same discriminator, we also fulfill the separation of shared features and private features. Experimental results on two SA datasets demonstrate the promising performance of our frameworks, which outperforms unsupervised state-of-the-art competitors.
Person re-identification (ReID) is an essential cross-camera retrieval task to identify pedestrians. However, the photo number of each pedestrian usually differs drastically, and thus the data limitation and imbalance problem hinders the prediction accuracy greatly. Additionally, in real-world applications, pedestrian images are captured by different surveillance cameras, so the noisy camera related information, such as the lights, perspectives and resolutions, result in inevitable domain gaps for ReID algorithms. These challenges bring difficulties to current deep learning methods with triplet loss for coping with such problems. To address these challenges, this paper proposes ReadNet, an adversarial camera network (ACN) with an angular triplet loss (ATL). In detail, ATL focuses on learning the angular distance among different identities to mitigate the effect of data imbalance, and guarantees a linear decision boundary as well, while ACN takes the camera discriminator as a game opponent of feature extractor to filter camera related information to bridge the multi-camera gaps. ReadNet is designed to be flexible so that either ATL or ACN can be deployed independently or simultaneously. The experiment results on various benchmark datasets have shown that ReadNet can deliver better prediction performance than current state-of-the-art methods.
Mutual Information (MI) plays an important role in representation learning. However, MI is unfortunately intractable in continuous and high-dimensional settings. Recent advances establish tractable and scalable MI estimators to discover useful representation. However, most of the existing methods are not capable of providing an accurate estimation of MI with low-variance when the MI is large. We argue that directly estimating the gradients of MI is more appealing for representation learning than estimating MI in itself. To this end, we propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on the score estimation of implicit distributions. MIGE exhibits a tight and smooth gradient estimation of MI in the high-dimensional and large-MI settings. We expand the applications of MIGE in both unsupervised learning of deep representations based on InfoMax and the Information Bottleneck method. Experimental results have indicated significant performance improvement in learning useful representation.
Variational autoencoders have been widely applied for natural language generation, however, there are two long-standing problems: information under-representation and posterior collapse. The former arises from the fact that only the last hidden state from the encoder is transformed to the latent space, which is insufficient to summarize data. The latter comes as a result of the imbalanced scale between the reconstruction loss and the KL divergence in the objective function. To tackle these issues, in this paper we propose the discrete variational attention model with categorical distribution over the attention mechanism owing to the discrete nature in languages. Our approach is combined with an auto-regressive prior to capture the sequential dependency from observations, which can enhance the latent space for language generation. Moreover, thanks to the property of discreteness, the training of our proposed approach does not suffer from posterior collapse. Furthermore, we carefully analyze the superiority of discrete latent space over the continuous space with the common Gaussian distribution. Extensive experiments on language generation demonstrate superior advantages of our proposed approach in comparison with the state-of-the-art counterparts.
Deep discriminative models (e.g. deep regression forests, deep Gaussian process) have been extensively studied recently to solve problems such as facial age estimation and head pose estimation. Most existing methods pursue to achieve robust and unbiased solutions through either learning more discriminative features, or weighting samples. We argue what is more desirable is to gradually learn to discriminate like our human being, and hence we resort to self-paced learning (SPL). Then, a natural question arises: can self-paced regime guide deep discriminative models to obtain more robust and less unbiased solutions? To this end, this paper proposes a new deep discriminative model--self-paced deep regression forests considering sample uncertainty (SPUDRFs). It builds up a new self-paced learning paradigm: easy and underrepresented samples first. This paradigm could be extended to combine with a variety of deep discriminative models. Extensive experiments on two computer vision tasks, i.e., facial age estimation and head pose estimation, demonstrate the efficacy of SPUDRFs, where state-of-the-art performances are achieved.
Multi-view clustering is an important approach to analyze multi-view data in an unsupervised way. Among various methods, the multi-view subspace clustering approach has gained increasing attention due to its encouraging performance. Basically, it integrates multi-view information into graphs, which are then fed into spectral clustering algorithm for final result. However, its performance may degrade due to noises existing in each individual view or inconsistency between heterogeneous features. Orthogonal to current work, we propose to fuse multi-view information in a partition space, which enhances the robustness of Multi-view clustering. Specifically, we generate multiple partitions and integrate them to find the shared partition. The proposed model unifies graph learning, generation of basic partitions, and view weight learning. These three components co-evolve towards better quality outputs. We have conducted comprehensive experiments on benchmark datasets and our empirical results verify the effectiveness and robustness of our approach.
Leveraging on the underlying low-dimensional structure of data, low-rank and sparse modeling approaches have achieved great success in a wide range of applications. However, in many applications the data can display structures beyond simply being low-rank or sparse. Fully extracting and exploiting hidden structure information in the data is always desirable and favorable. To reveal more underlying effective manifold structure, in this paper, we explicitly model the data relation. Specifically, we propose a structure learning framework that retains the pairwise similarities between the data points. Rather than just trying to reconstruct the original data based on self-expression, we also manage to reconstruct the kernel matrix, which functions as similarity preserving. Consequently, this technique is particularly suitable for the class of learning problems that are sensitive to sample similarity, e.g., clustering and semisupervised classification. To take advantage of representation power of deep neural network, a deep auto-encoder architecture is further designed to implement our model. Extensive experiments on benchmark data sets demonstrate that our proposed framework can consistently and significantly improve performance on both evaluation tasks. We conclude that the quality of structure learning can be enhanced if similarity information is incorporated.
Representing words and phrases into dense vectors of real numbers which encode semantic and syntactic properties is a vital constituent in natural language processing (NLP). The success of neural network (NN) models in NLP largely rely on such dense word representations learned on the large unlabeled corpus. Sindhi is one of the rich morphological language, spoken by large population in Pakistan and India lacks corpora which plays an essential role of a test-bed for generating word embeddings and developing language independent NLP systems. In this paper, a large corpus of more than 61 million words is developed for low-resourced Sindhi language for training neural word embeddings. The corpus is acquired from multiple web-resources using web-scrappy. Due to the unavailability of open source preprocessing tools for Sindhi, the prepossessing of such large corpus becomes a challenging problem specially cleaning of noisy data extracted from web resources. Therefore, a preprocessing pipeline is employed for the filtration of noisy text. Afterwards, the cleaned vocabulary is utilized for training Sindhi word embeddings with state-of-the-art GloVe, Skip-Gram (SG), and Continuous Bag of Words (CBoW) word2vec algorithms. The intrinsic evaluation approach of cosine similarity matrix and WordSim-353 are employed for the evaluation of generated Sindhi word embeddings. Moreover, we compare the proposed word embeddings with recently revealed Sindhi fastText (SdfastText) word representations. Our intrinsic evaluation results demonstrate the high quality of our generated Sindhi word embeddings using SG, CBoW, and GloVe as compare to SdfastText word representations.
A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typically have a quadratic or even cubic complexity, are inefficient and inherently difficult to apply at large scales. In the era of big data, the computational issue becomes critical. To fill this gap, we propose a large-scale MVSC (LMVSC) algorithm with linear order complexity. Inspired by the idea of anchor graph, we first learn a smaller graph for each view. Then, a novel approach is designed to integrate those graphs so that we can implement spectral clustering on a smaller graph. Interestingly, it turns out that our model also applies to single-view scenario. Extensive experiments on various large-scale benchmark data sets validate the effectiveness and efficiency of our approach with respect to state-of-the-art clustering methods.