Improving controllability or the ability to manipulate one or more attributes of the generated data has become a topic of interest in the context of deep generative models of music. Recent attempts in this direction have relied on learning disentangled representations from data such that the underlying factors of variation are well separated. In this paper, we focus on the relationship between disentanglement and controllability by conducting a systematic study using different supervised disentanglement learning algorithms based on the Variational Auto-Encoder (VAE) architecture. Our experiments show that a high degree of disentanglement can be achieved by using different forms of supervision to train a strong discriminative encoder. However, in the absence of a strong generative decoder, disentanglement does not necessarily imply controllability. The structure of the latent space with respect to the VAE-decoder plays an important role in boosting the ability of a generative model to manipulate different attributes. To this end, we also propose methods and metrics to help evaluate the quality of a latent space with respect to the afforded degree of controllability.
Blood vessel segmentation is one of the most studied topics in computer vision, due to its relevance in daily clinical practice. Despite the evolution the field has been facing, especially after the dawn of deep learning, important challenges are still not solved. One of them concerns the consistency of the topological properties of the vascular trees, given that the best performing methodologies do not directly penalize mistakes such as broken segments and end up producing predictions with disconnected trees. This is particularly relevant in graph-like structures, such as blood vessel trees, given that it puts at risk the characterization steps that follow the segmentation task. In this paper, we propose a similarity index which captures the topological consistency of the predicted segmentations having as reference the ground truth. We also design a novel loss function based on the morphological closing operator and show how it allows to learn deep neural network models which produce more topologically coherent masks. Our experiments target well known retinal benchmarks and a coronary angiogram database.
Magnetic Particle Imaging (MPI) is a promising tracer-based, functional medical imaging technique which measures the non-linear response of superparamagnetic iron oxide nanoparticles (SPION) to a dynamic magnetic field. For image reconstruction, system matrices from time-consuming calibration scans are used predominantly. Finding modeled forward operators for magnetic particle imaging, which are able to compete with measured matrices in practice, is an ongoing topic of research. The existing models for magnetic particle imaging are by design not suitable for dynamic tracer concentrations. Neither modeled nor measured system matrices account for changes in the concentration during a single scanning cycle. In this paper we present a new MPI forward model for dynamic concentrations. A standard model will be introduced briefly, followed by the changes due to the dynamic behavior of the tracer concentration. Furthermore, the relevance of this new extended model is examined by investigating the influence of the extension and example reconstructions with the new and the standard model.
Text-to-speech is now able to achieve near-human naturalness and research focus has shifted to increasing expressivity. One popular method is to transfer the prosody from a reference speech sample. There have been considerable advances in using prosody transfer to generate more expressive speech, but the field lacks a clear definition of what successful prosody transfer means and a method for measuring it. We introduce a dataset of prosodically-varied reference natural speech samples for evaluating prosody transfer. The samples include global variations reflecting emotion and interpersonal attitude, and local variations reflecting topical emphasis, propositional attitude, syntactic phrasing and marked tonicity. The corpus only includes prosodic variations that listeners are able to distinguish with reasonable accuracy, and we report these figures as a benchmark against which text-to-speech prosody transfer can be compared. We conclude the paper with a demonstration of our proposed evaluation methodology, using the corpus to evaluate two text-to-speech models that perform prosody transfer.
Adversarial robustness has become a topic of growing interest in machine learning since it was observed that neural networks tend to be brittle. We propose an information-geometric formulation of adversarial defense and introduce FIRE, a new Fisher-Rao regularization for the categorical cross-entropy loss, which is based on the geodesic distance between natural and perturbed input features. Based on the information-geometric properties of the class of softmax distributions, we derive an explicit characterization of the Fisher-Rao Distance (FRD) for the binary and multiclass cases, and draw some interesting properties as well as connections with standard regularization metrics. Furthermore, for a simple linear and Gaussian model, we show that all Pareto-optimal points in the accuracy-robustness region can be reached by FIRE while other state-of-the-art methods fail. Empirically, we evaluate the performance of various classifiers trained with the proposed loss on standard datasets, showing up to 2\% of improvements in terms of robustness while reducing the training time by 20\% over the best-performing methods.
Recent years have witnessed the fast development of the emerging topic of Graph Learning based Recommender Systems (GLRS). GLRS employ advanced graph learning approaches to model users' preferences and intentions as well as items' characteristics for recommendations. Differently from other RS approaches, including content-based filtering and collaborative filtering, GLRS are built on graphs where the important objects, e.g., users, items, and attributes, are either explicitly or implicitly connected. With the rapid development of graph learning techniques, exploring and exploiting homogeneous or heterogeneous relations in graphs are a promising direction for building more effective RS. In this paper, we provide a systematic review of GLRS, by discussing how they extract important knowledge from graph-based representations to improve the accuracy, reliability and explainability of the recommendations. First, we characterize and formalize GLRS, and then summarize and categorize the key challenges and main progress in this novel research area. Finally, we share some new research directions in this vibrant area.
The significant increase in world population and urbanisation has brought several important challenges, in particular regarding the sustainability, maintenance and planning of urban mobility. At the same time, the exponential increase of computing capability and of available sensor and location data have offered the potential for innovative solutions to these challenges. In this work, we focus on the challenge of traffic forecasting and review the recent development and application of graph neural networks (GNN) to this problem. GNNs are a class of deep learning methods that directly process the input as graph data. This leverages more directly the spatial dependencies of traffic data and makes use of the advantages of deep learning producing state-of-the-art results. We introduce and review the emerging topic of GNNs, including their most common variants, with a focus on its application to traffic forecasting. We address the different ways of modelling traffic forecasting as a (temporal) graph, the different approaches developed so far to combine the graph and temporal learning components, as well as current limitations and research opportunities.
Since 2014, the NIH funded iDASH (integrating Data for Analysis, Anonymization, SHaring) National Center for Biomedical Computing has hosted yearly competitions on the topic of private computing for genomic data. For one track of the 2020 iteration of this competition, participants were challenged to produce an approach to federated learning (FL) training of genomic cancer prediction models using differential privacy (DP), with submissions ranked according to held-out test accuracy for a given set of DP budgets. More precisely, in this track, we are tasked with training a supervised model for the prediction of breast cancer occurrence from genomic data split between two virtual centers while ensuring data privacy with respect to model transfer via DP. In this article, we present our 3rd place submission to this competition. During the competition, we encountered two main challenges discussed in this article: i) ensuring correctness of the privacy budget evaluation and ii) achieving an acceptable trade-off between prediction performance and privacy budget.
Humans give contrastive explanations that explain why an observed event happened rather than some other counterfactual event (the contrast case). Despite the important role that contrastivity plays in how people generate and evaluate explanations, this property is largely missing from current methods for explaining NLP models. We present Minimal Contrastive Editing (MiCE), a method for generating contrastive explanations of model predictions in the form of edits to inputs that change model outputs to the contrast case. Our experiments across three tasks -- binary sentiment classification, topic classification, and multiple-choice question answering -- show that MiCE is able to produce edits that are not only contrastive, but also minimal and fluent, consistent with human contrastive edits. We demonstrate how MiCE edits can be used for two use cases in NLP system development -- uncovering dataset artifacts and debugging incorrect model predictions -- and thereby illustrate that generating contrastive explanations is a promising research direction for model interpretability.
The field of e-learning has emerged as a topic of interest in academia due to the increased ease of accessing the Internet using using smart-phones and wireless devices. One of the challenges facing e-learning platforms is how to keep students motivated and engaged. Moreover, it is also crucial to identify the students that might need help in order to make sure their academic performance doesn't suffer. To that end, this paper tries to investigate the relationship between student engagement and their academic performance. Apriori association rules algorithm is used to derive a set of rules that relate student engagement to academic performance. Experimental results' analysis done using confidence and lift metrics show that a positive correlation exists between students' engagement level and their academic performance in a blended e-learning environment. In particular, it is shown that higher engagement often leads to better academic performance. This cements the previous work that linked engagement and academic performance in traditional classrooms.