Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Multi-modal multi-objective model-based genetic programming to find multiple diverse high-quality models

Mar 24, 2022
E. M. C. Sijben, T. Alderliesten, P. A. N. Bosman

Explainable artificial intelligence (XAI) is an important and rapidly expanding research topic. The goal of XAI is to gain trust in a machine learning (ML) model through clear insights into how the model arrives at its predictions. Genetic programming (GP) is often cited as being uniquely well-suited to contribute to XAI because of its capacity to learn (small) symbolic models that have the potential to be interpreted. Nevertheless, like many ML algorithms, GP typically results in a single best model. However, in practice, the best model in terms of training error may well not be the most suitable one as judged by a domain expert for various reasons, including overfitting, multiple different models existing that have similar accuracy, and unwanted errors on particular data points due to typical accuracy measures like mean squared error. Hence, to increase chances that domain experts deem a resulting model plausible, it becomes important to be able to explicitly search for multiple, diverse, high-quality models that trade-off different meanings of accuracy. In this paper, we achieve exactly this with a novel multi-modal multi-tree multi-objective GP approach that extends a modern model-based GP algorithm known as GP-GOMEA that is already effective at searching for small expressions.

  Access Paper or Ask Questions

Label-efficient Hybrid-supervised Learning for Medical Image Segmentation

Mar 10, 2022
Junwen Pan, Qi Bi, Yanzhan Yang, Pengfei Zhu, Cheng Bian

Due to the lack of expertise for medical image annotation, the investigation of label-efficient methodology for medical image segmentation becomes a heated topic. Recent progresses focus on the efficient utilization of weak annotations together with few strongly-annotated labels so as to achieve comparable segmentation performance in many unprofessional scenarios. However, these approaches only concentrate on the supervision inconsistency between strongly- and weakly-annotated instances but ignore the instance inconsistency inside the weakly-annotated instances, which inevitably leads to performance degradation. To address this problem, we propose a novel label-efficient hybrid-supervised framework, which considers each weakly-annotated instance individually and learns its weight guided by the gradient direction of the strongly-annotated instances, so that the high-quality prior in the strongly-annotated instances is better exploited and the weakly-annotated instances are depicted more precisely. Specially, our designed dynamic instance indicator (DII) realizes the above objectives, and is adapted to our dynamic co-regularization (DCR) framework further to alleviate the erroneous accumulation from distortions of weak annotations. Extensive experiments on two hybrid-supervised medical segmentation datasets demonstrate that with only 10% strong labels, the proposed framework can leverage the weak labels efficiently and achieve competitive performance against the 100% strong-label supervised scenario.

* Accepted to AAAI 2022 

  Access Paper or Ask Questions

A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges

Mar 02, 2022
Wenxuan Zhang, Xin Li, Yang Deng, Lidong Bing, Wai Lam

As an important fine-grained sentiment analysis problem, aspect-based sentiment analysis (ABSA), aiming to analyze and understand people's opinions at the aspect level, has been attracting considerable interest in the last decade. To handle ABSA in different scenarios, various tasks have been introduced for analyzing different sentiment elements and their relations, including the aspect term, aspect category, opinion term, and sentiment polarity. Unlike early ABSA works focusing on a single sentiment element, many compound ABSA tasks involving multiple elements have been studied in recent years for capturing more complete aspect-level sentiment information. However, a systematic review of various ABSA tasks and their corresponding solutions is still lacking, which we aim to fill in this survey. More specifically, we provide a new taxonomy for ABSA which organizes existing studies from the axes of concerned sentiment elements, with an emphasis on recent advances of compound ABSA tasks. From the perspective of solutions, we summarize the utilization of pre-trained language models for ABSA, which improved the performance of ABSA to a new stage. Besides, techniques for building more practical ABSA systems in cross-domain/lingual scenarios are discussed. Finally, we review some emerging topics and discuss some open challenges to outlook potential future directions of ABSA.

  Access Paper or Ask Questions

Click-Through Rate Prediction in Online Advertising: A Literature Review

Feb 22, 2022
Yanwu Yang, Panyu Zhai

Predicting the probability that a user will click on a specific advertisement has been a prevalent issue in online advertising, attracting much research attention in the past decades. As a hot research frontier driven by industrial needs, recent years have witnessed more and more novel learning models employed to improve advertising CTR prediction. Although extant research provides necessary details on algorithmic design for addressing a variety of specific problems in advertising CTR prediction, the methodological evolution and connections between modeling frameworks are precluded. However, to the best of our knowledge, there are few comprehensive surveys on this topic. We make a systematic literature review on state-of-the-art and latest CTR prediction research, with a special focus on modeling frameworks. Specifically, we give a classification of state-of-the-art CTR prediction models in the extant literature, within which basic modeling frameworks and their extensions, advantages and disadvantages, and performance assessment for CTR prediction are presented. Moreover, we summarize CTR prediction models with respect to the complexity and the order of feature interactions, and performance comparisons on various datasets. Furthermore, we identify current research trends, main challenges and potential future directions worthy of further explorations. This review is expected to provide fundamental knowledge and efficient entry points for IS and marketing scholars who want to engage in this area.

* Information Processing & Management, 59(2): 102853 (2022) 
* 85 pages, 12 figures, 9 tables 

  Access Paper or Ask Questions

Distribution Regression with Sliced Wasserstein Kernels

Feb 08, 2022
Dimitri Meunier, Massimiliano Pontil, Carlo Ciliberto

The problem of learning functions over spaces of probabilities - or distribution regression - is gaining significant interest in the machine learning community. A key challenge behind this problem is to identify a suitable representation capturing all relevant properties of the underlying functional mapping. A principled approach to distribution regression is provided by kernel mean embeddings, which lifts kernel-induced similarity on the input domain at the probability level. This strategy effectively tackles the two-stage sampling nature of the problem, enabling one to derive estimators with strong statistical guarantees, such as universal consistency and excess risk bounds. However, kernel mean embeddings implicitly hinge on the maximum mean discrepancy (MMD), a metric on probabilities, which may fail to capture key geometrical relations between distributions. In contrast, optimal transport (OT) metrics, are potentially more appealing, as documented by the recent literature on the topic. In this work, we propose the first OT-based estimator for distribution regression. We build on the Sliced Wasserstein distance to obtain an OT-based representation. We study the theoretical properties of a kernel ridge regression estimator based on such representation, for which we prove universal consistency and excess risk bounds. Preliminary experiments complement our theoretical findings by showing the effectiveness of the proposed approach and compare it with MMD-based estimators.

  Access Paper or Ask Questions

Heed the Noise in Performance Evaluations in Neural Architecture Search

Feb 04, 2022
Arkadiy Dushatskiy, Tanja Alderliesten, Peter A. N. Bosman

Neural Architecture Search (NAS) has recently become a topic of great interest. However, there is a potentially impactful issue within NAS that remains largely unrecognized: noise. Due to stochastic factors in neural network initialization, training, and the chosen train/validation dataset split, the performance evaluation of a neural network architecture, which is often based on a single learning run, is also stochastic. This may have a particularly large impact if a dataset is small. We therefore propose to reduce the noise by having architecture evaluations comprise averaging of scores over multiple network training runs using different random seeds and cross-validation. We perform experiments for a combinatorial optimization formulation of NAS in which we vary noise reduction levels. We use the same computational budget for each noise level in terms of network training runs, i.e., we allow less architecture evaluations when averaging over more training runs. Multiple search algorithms are considered, including evolutionary algorithms which generally perform well for NAS. We use two publicly available datasets from the medical image segmentation domain where datasets are often limited and variability among samples is often high. Our results show that reducing noise in architecture evaluations enables finding better architectures by all considered search algorithms.

  Access Paper or Ask Questions

Modeling Performance in Open-Domain Dialogue with PARADISE

Oct 21, 2021
Marilyn Walker, Colin Harmon, James Graupera, Davan Harrison, Steve Whittaker

There has recently been an explosion of work on spoken dialogue systems, along with an increased interest in open-domain systems that engage in casual conversations on popular topics such as movies, books and music. These systems aim to socially engage, entertain, and even empathize with their users. Since the achievement of such social goals is hard to measure, recent research has used dialogue length or human ratings as evaluation metrics, and developed methods for automatically calculating novel metrics, such as coherence, consistency, relevance and engagement. Here we develop a PARADISE model for predicting the performance of Athena, a dialogue system that has participated in thousands of conversations with real users, while competing as a finalist in the Alexa Prize. We use both user ratings and dialogue length as metrics for dialogue quality, and experiment with predicting these metrics using automatic features that are both system dependent and independent. Our goal is to learn a general objective function that can be used to optimize the dialogue choices of any Alexa Prize system in real time and evaluate its performance. Our best model for predicting user ratings gets an R$^2$ of .136 with a DistilBert model, and the best model for predicting length with system independent features gets an R$^2$ of .865, suggesting that conversation length may be a more reliable measure for automatic training of dialogue systems.

* The 12th International Workshop on Spoken Dialog System Technology, November 2021 

  Access Paper or Ask Questions

Impact of COVID-19 Policies and Misinformation on Social Unrest

Oct 07, 2021
Martha Barnard, Radhika Iyer, Sara Y. Del Valle, Ashlynn R. Daughton

The novel coronavirus disease (COVID-19) pandemic has impacted every corner of earth, disrupting governments and leading to socioeconomic instability. This crisis has prompted questions surrounding how different sectors of society interact and influence each other during times of change and stress. Given the unprecedented economic and societal impacts of this pandemic, many new data sources have become available, allowing us to quantitatively explore these associations. Understanding these relationships can help us better prepare for future disasters and mitigate the impacts. Here, we focus on the interplay between social unrest (protests), health outcomes, public health orders, and misinformation in eight countries of Western Europe and four regions of the United States. We created 1-3 week forecasts of both a binary protest metric for identifying times of high protest activity and the overall protest counts over time. We found that for all regions, except Belgium, at least one feature from our various data streams was predictive of protests. However, the accuracy of the protest forecasts varied by country, that is, for roughly half of the countries analyzed, our forecasts outperform a na\"ive model. These mixed results demonstrate the potential of diverse data streams to predict a topic as volatile as protests as well as the difficulties of predicting a situation that is as rapidly evolving as a pandemic.

* 21 pages, 9 figures 

  Access Paper or Ask Questions

Prior Omission of Dissimilar Source Domain(s) for Cost-Effective Few-Shot Learning

Sep 11, 2021
Zezhong Wang, Hongru Wang, Kwan Wai Chung, Jia Zhu, Gabriel Pui Cheong Fung, Kam-Fai Wong

Few-shot slot tagging is an emerging research topic in the field of Natural Language Understanding (NLU). With sufficient annotated data from source domains, the key challenge is how to train and adapt the model to another target domain which only has few labels. Conventional few-shot approaches use all the data from the source domains without considering inter-domain relations and implicitly assume each sample in the domain contributes equally. However, our experiments show that the data distribution bias among different domains will significantly affect the adaption performance. Moreover, transferring knowledge from dissimilar domains will even introduce some extra noises so that affect the performance of models. To tackle this problem, we propose an effective similarity-based method to select data from the source domains. In addition, we propose a Shared-Private Network (SP-Net) for the few-shot slot tagging task. The words from the same class would have some shared features. We extract those shared features from the limited annotated data on the target domain and merge them together as the label embedding to help us predict other unlabelled data on the target domain. The experiment shows that our method outperforms the state-of-the-art approaches with fewer source data. The result also proves that some training data from dissimilar sources are redundant and even negative for the adaption.

  Access Paper or Ask Questions

Bayesian learning of forest and tree graphical models

Aug 31, 2021
Edmund Jones

In Bayesian learning of Gaussian graphical model structure, it is common to restrict attention to certain classes of graphs and approximate the posterior distribution by repeatedly moving from one graph to another, using MCMC or methods such as stochastic shotgun search (SSS). I give two corrected versions of an algorithm for non-decomposable graphs and discuss random graph distributions, in particular as prior distributions. The main topic of the thesis is Bayesian structure-learning with forests or trees. Restricting attention to these graphs can be justified using theorems on random graphs. I describe how to use the Chow$\unicode{x2013}$Liu algorithm and the Matrix Tree Theorem to find the MAP forest and certain quantities in the posterior distribution on trees. I give adapted versions of MCMC and SSS for approximating the posterior distribution for forests and trees, and systems for storing these graphs so that it is easy to choose moves to neighbouring graphs. Experiments show that SSS with trees does well when the true graph is a tree or sparse graph. SSS with trees or forests does better than SSS with decomposable graphs in certain cases. Graph priors improve detection of hubs but need large ranges of probabilities. MCMC on forests fails to mix well and MCMC on trees is slower than SSS. (For a longer abstract see the thesis.)

* PhD thesis, 2013, University of Bristol; 148 pages, 24 figures 

  Access Paper or Ask Questions