Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Multi-view redescription mining using tree-based multi-target prediction models

Jun 22, 2020
Matej Mihelčić, Sašo Džeroski, Tomislav Šmuc

The task of redescription mining is concerned with re-describing different subsets of entities contained in a dataset and revealing non-trivial associations between different subsets of attributes, called views. This interesting and challenging task is encountered in different scientific fields, and is addressed by a number of approaches that obtain redescriptions and allow for the exploration and analysis of attribute associations. The main limitation of existing approaches to this task is their inability to use more than two views. Our work alleviates this drawback. We present a memory efficient, extensible multi-view redescription mining framework that can be used to relate multiple, i.e. more than two views, disjoint sets of attributes describing one set of entities. The framework includes: a) the use of random forest of Predictive Clustering trees, with and without random output selection, and random forests of Extra Predictive Clustering trees, b) using Extra Predictive Clustering trees as a main rule generation mechanism in the framework and c) using random view subset projections. We provide multiple performance analyses of the proposed framework and demonstrate its usefulness in increasing the understanding of different machine learning models, which has become a topic of growing importance in machine learning and especially in the field of computer science called explainable data science.

  Access Paper or Ask Questions

Exploiting Contextual Information with Deep Neural Networks

Jun 21, 2020
Ismail Elezi

Context matters! Nevertheless, there has not been much research in exploiting contextual information in deep neural networks. For most part, the entire usage of contextual information has been limited to recurrent neural networks. Attention models and capsule networks are two recent ways of introducing contextual information in non-recurrent models, however both of these algorithms have been developed after this work has started. In this thesis, we show that contextual information can be exploited in 2 fundamentally different ways: implicitly and explicitly. In the DeepScore project, where the usage of context is very important for the recognition of many tiny objects, we show that by carefully crafting convolutional architectures, we can achieve state-of-the-art results, while also being able to implicitly correctly distinguish between objects which are virtually identical, but have different meanings based on their surrounding. In parallel, we show that by explicitly designing algorithms (motivated from graph theory and game theory) that take into considerations the entire structure of the dataset, we can achieve state-of-the-art results in different topics like semi-supervised learning and similarity learning. To the best of our knowledge, we are the first to integrate graph-theoretical modules, carefully crafted for the problem of similarity learning and that are designed to consider contextual information, not only outperforming the other models, but also gaining a speed improvement while using a smaller number of parameters.

* Ph.D. thesis 

  Access Paper or Ask Questions

Determinantal Point Processes in Randomized Numerical Linear Algebra

May 07, 2020
Michał Dereziński, Michael W. Mahoney

Randomized Numerical Linear Algebra (RandNLA) uses randomness to develop improved algorithms for matrix problems that arise in scientific computing, data science, machine learning, etc. Determinantal Point Processes (DPPs), a seemingly unrelated topic in pure and applied mathematics, is a class of stochastic point processes with probability distribution characterized by sub-determinants of a kernel matrix. Recent work has uncovered deep and fruitful connections between DPPs and RandNLA which lead to new guarantees and improved algorithms that are of interest to both areas. We provide an overview of this exciting new line of research, including brief introductions to RandNLA and DPPs, as well as applications of DPPs to classical linear algebra tasks such as least squares regression, low-rank approximation and the Nystr\"om method. For example, random sampling with a DPP leads to new kinds of unbiased estimators for least squares, enabling more refined statistical and inferential understanding of these algorithms; a DPP is, in some sense, an optimal randomized algorithm for the Nystr\"om method; and a RandNLA technique called leverage score sampling can be derived as the marginal distribution of a DPP. We also discuss recent algorithmic developments, illustrating that, while not quite as efficient as standard RandNLA techniques, DPP-based algorithms are only moderately more expensive.

  Access Paper or Ask Questions

Neural Inheritance Relation Guided One-Shot Layer Assignment Search

Feb 28, 2020
Rang Meng, Weijie Chen, Di Xie, Yuan Zhang, Shiliang Pu

Layer assignment is seldom picked out as an independent research topic in neural architecture search. In this paper, for the first time, we systematically investigate the impact of different layer assignments to the network performance by building an architecture dataset of layer assignment on CIFAR-100. Through analyzing this dataset, we discover a neural inheritance relation among the networks with different layer assignments, that is, the optimal layer assignments for deeper networks always inherit from those for shallow networks. Inspired by this neural inheritance relation, we propose an efficient one-shot layer assignment search approach via inherited sampling. Specifically, the optimal layer assignment searched in the shallow network can be provided as a strong sampling priori to train and search the deeper ones in supernet, which extremely reduces the network search space. Comprehensive experiments carried out on CIFAR-100 illustrate the efficiency of our proposed method. Our search results are strongly consistent with the optimal ones directly selected from the architecture dataset. To further confirm the generalization of our proposed method, we also conduct experiments on Tiny-ImageNet and ImageNet. Our searched results are remarkably superior to the handcrafted ones under the unchanged computational budgets. The neural inheritance relation discovered in this paper can provide insights to the universal neural architecture search.

* AAAI2020 

  Access Paper or Ask Questions

Characterizing Collective Attention via Descriptor Context in Public Discussions of Crisis Events

Sep 19, 2019
Ian Stewart, Diyi Yang, Jacob Eisenstein

Collective attention is central to the spread of real world news and the key to understanding how public discussions report emerging topics and breaking news. Most research measures collective attention via activity metrics such as post volume. While useful, this kind of metric obscures the nuanced content side of collective attention, which may reflect how breaking events are perceived by the public. In this work, we conduct a large-scale language analysis of public online discussions of breaking crisis events on Facebook and Twitter. Specifically, we examine how people refer to locations of hurricanes in their discussion with or without contextual information (e.g. writing "San Juan" vs. "San Juan, Puerto Rico") and how such descriptor expressions are added or omitted in correlation with social factors including relative time, audience and additional information requirements. We find that authors' references to locations are influenced by both macro-level factors such as the location's global importance and micro-level social factors like audience characteristics, and there is a decrease in descriptor context use over time at a collective level as well as at an individual-author level. Our results provide insight that can help crisis event analysts to better predict the public's understanding of news events and to determine how to share information during such events.

* under review 

  Access Paper or Ask Questions

Uncovering Sociological Effect Heterogeneity using Machine Learning

Sep 18, 2019
Jennie E. Brand, Jiahui Xu, Bernard Koch, Pablo Geraldo

Individuals do not respond uniformly to treatments, events, or interventions. Sociologists routinely partition samples into subgroups to explore how the effects of treatments vary by covariates like race, gender, and socioeconomic status. In so doing, analysts determine the key subpopulations based on theoretical priors. Data-driven discoveries are also routine, yet the analyses by which sociologists typically go about them are problematic and seldom move us beyond our expectations, and biases, to explore new meaningful subgroups. Emerging machine learning methods allow researchers to explore sources of variation that they may not have previously considered, or envisaged. In this paper, we use causal trees to recursively partition the sample and uncover sources of treatment effect heterogeneity. We use honest estimation, splitting the sample into a training sample to grow the tree and an estimation sample to estimate leaf-specific effects. Assessing a central topic in the social inequality literature, college effects on wages, we compare what we learn from conventional approaches for exploring variation in effects to causal trees. Given our use of observational data, we use leaf-specific matching and sensitivity analyses to address confounding and offer interpretations of effects based on observed and unobserved heterogeneity. We encourage researchers to follow similar practices in their work on variation in sociological effects.

  Access Paper or Ask Questions

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

Aug 14, 2019
Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian

Multimodal learning aims to discover the relationship between multiple modalities. It has become an important research topic due to extensive multimodal applications such as cross-modal retrieval. This paper attempts to address the modality heterogeneity problem based on Gaussian process latent variable models (GPLVMs) to represent multimodal data in a common space. Previous multimodal GPLVM extensions generally adopt individual learning schemes on latent representations and kernel hyperparameters, which ignore their intrinsic relationship. To exploit strong complementarity among different modalities and GPLVM components, we develop a novel learning scheme called Harmonization, where latent model parameters are jointly learned from each other. Beyond the correlation fitting or intra-modal structure preservation paradigms widely used in existing studies, the harmonization is derived in a model-driven manner to encourage the agreement between modality-specific GP kernels and the similarity of latent representations. We present a range of multimodal learning models by incorporating the harmonization mechanism into several representative GPLVM-based approaches. Experimental results on four benchmark datasets show that the proposed models outperform the strong baselines for cross-modal retrieval tasks, and that the harmonized multimodal learning method is superior in discovering semantically consistent latent representation.

  Access Paper or Ask Questions

SCAR: Spatial-/Channel-wise Attention Regression Networks for Crowd Counting

Aug 10, 2019
Junyu Gao, Qi Wang, Yuan Yuan

Recently, crowd counting is a hot topic in crowd analysis. Many CNN-based counting algorithms attain good performance. However, these methods only focus on the local appearance features of crowd scenes but ignore the large-range pixel-wise contextual and crowd attention information. To remedy the above problems, in this paper, we introduce the Spatial-/Channel-wise Attention Models into the traditional Regression CNN to estimate the density map, which is named as "SCAR". It consists of two modules, namely Spatial-wise Attention Model (SAM) and Channel-wise Attention Model (CAM). The former can encode the pixel-wise context of the entire image to more accurately predict density maps at the pixel level. The latter attempts to extract more discriminative features among different channels, which aids model to pay attention to the head region, the core of crowd scenes. Intuitively, CAM alleviates the mistaken estimation for background regions. Finally, two types of attention information and traditional CNN's feature maps are integrated by a concatenation operation. Furthermore, the extensive experiments are conducted on four popular datasets, Shanghai Tech Part A/B, GCC, and UCF_CC_50 Dataset. The results show that the proposed method achieves state-of-the-art results.

* accepted by Neurocomputing 

  Access Paper or Ask Questions

Optimal Randomness in Swarm-based Search

May 07, 2019
Jiamin Wei, Yangquan Chen, Yongguang Yu, Yuquan Chen

Swarm-based search has been a hot topic for a long time. Among all the proposed algorithms, Cuckoo search (CS) has been proved to be an efficient approach for global optimum searching due to the combination of L\'{e}vy flights, local search capabilities and guaranteed global convergence. CS uses L\'{e}vy flights which are generated from the L\'{e}vy distribution, a heavy-tailed probability distribution, in global random walk to explore the search space. In this case, large steps are more likely to be generated, which plays an important role in enhancing the search capability. Although movements of many foragers and wandering animals have been shown to follow a L\'{e}vy distribution, investigation into the impact of different heavy-tailed probability distributions on CS is still insufficient up to now. In this paper, four different types of commonly used heavy-tailed distributions, including Mittag-Leffler distribution, Pareto distribution Cauchy distribution, and Weibull distribution, are considered to enhance the searching ability of CS. Then four novel CS algorithms are proposed and experiments are carried out on 20 benchmark functions to compare their searching performance. Finally, the proposed methods are used to system identification to demonstrate the effectiveness.

  Access Paper or Ask Questions

Polarity Loss for Zero-shot Object Detection

Nov 22, 2018
Shafin Rahman, Salman Khan, Nick Barnes

Zero-shot object detection is an emerging research topic that aims to recognize and localize previously 'unseen' objects. This setting gives rise to several unique challenges, e.g., highly imbalanced positive vs. negative instance ratio, ambiguity between background and unseen classes and the proper alignment between visual and semantic concepts. Here, we propose an end-to-end deep learning framework underpinned by a novel loss function that puts more emphasis on difficult examples to avoid class imbalance. We call our objective the 'Polarity loss' because it explicitly maximizes the gap between positive and negative predictions. Such a margin maximizing formulation is important as it improves the visual-semantic alignment while resolving the ambiguity between background and unseen. Our approach is inspired by the embodiment theories in cognitive science, that claim human semantic understanding to be grounded in past experiences (seen objects), related linguistic concepts (word dictionary) and the perception of the physical world (visual imagery). To this end, we learn to attend to a dictionary of related semantic concepts that eventually refines the noisy semantic embeddings and helps establish a better synergy between visual and semantic domains. Our extensive results on MS-COCO and Pascal VOC datasets show as high as 14 x mAP improvement over state of the art.

  Access Paper or Ask Questions