Emotion recognition as a key component of high-stake downstream applications has been shown to be effective, such as classroom engagement or mental health assessments. These systems are generally trained on small datasets collected in single laboratory environments, and hence falter when tested on data that has different noise characteristics. Multiple noise-based data augmentation approaches have been proposed to counteract this challenge in other speech domains. But, unlike speech recognition and speaker verification, in emotion recognition, noise-based data augmentation may change the underlying label of the original emotional sample. In this work, we generate realistic noisy samples of a well known emotion dataset (IEMOCAP) using multiple categories of environmental and synthetic noise. We evaluate how both human and machine emotion perception changes when noise is introduced. We find that some commonly used augmentation techniques for emotion recognition significantly change human perception, which may lead to unreliable evaluation metrics such as evaluating efficiency of adversarial attack. We also find that the trained state-of-the-art emotion recognition models fail to classify unseen noise-augmented samples, even when trained on noise augmented datasets. This finding demonstrates the brittleness of these systems in real-world conditions. We propose a set of recommendations for noise-based augmentation of emotion datasets and for how to deploy these emotion recognition systems "in the wild".
Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial-based systemic inequalities is an important step towards a more equitable research system. However, few large-scale analyses have been performed on this topic, mostly because of the lack of robust race-disambiguation algorithms. Identifying author information does not generally include the author's race. Therefore, an algorithm needs to be employed, using known information about authors, i.e., their names, to infer their perceived race. Nevertheless, as any other algorithm, the process of racial inference can generate biases if it is not carefully considered. When the research is focused on the understanding of racial-based inequalities, such biases undermine the objectives of the investigation and may perpetuate inequities. The goal of this article is to assess the biases introduced by the different approaches used name-based racial inference. We use information from US census and mortgage applications to infer the race of US author names in the Web of Science. We estimate the effects of using given and family names, thresholds or continuous distributions, and imputation. Our results demonstrate that the validity of name-based inference varies by race and ethnicity and that threshold approaches underestimate Black authors and overestimate White authors. We conclude with recommendations to avoid potential biases. This article fills an important research gap that will allow more systematic and unbiased studies on racial disparity in science.
Graph embedding is a transformation of nodes of a graph into a set of vectors. A~good embedding should capture the graph topology, node-to-node relationship, and other relevant information about the graph, its subgraphs, and nodes. If these objectives are achieved, an embedding is a meaningful, understandable, compressed representations of a network that can be used for other machine learning tools such as node classification, community detection, or link prediction. The main challenge is that one needs to make sure that embeddings describe the properties of the graphs well. As a result, selecting the best embedding is a challenging task and very often requires domain experts. In this paper, we do a series of extensive experiments with selected graph embedding algorithms, both on real-world networks as well as artificially generated ones. Based on those experiments we formulate two general conclusions. First, if one needs to pick one embedding algorithm before running the experiments, then node2vec is the best choice as it performed best in our tests. Having said that, there is no single winner in all tests and, additionally, most embedding algorithms have hyperparameters that should be tuned and are randomized. Therefore, our main recommendation for practitioners is, if possible, to generate several embeddings for a problem at hand and then use a general framework that provides a tool for an unsupervised graph embedding comparison. This framework (introduced recently in the literature and easily available on GitHub repository) assigns the divergence score to embeddings to help distinguish good ones from bad ones.
Artificial intelligence (AI) has huge potential to improve the health and well-being of people, but adoption in clinical practice is still limited. Lack of transparency is identified as one of the main barriers to implementation, as clinicians should be confident the AI system can be trusted. Explainable AI has the potential to overcome this issue and can be a step towards trustworthy AI. In this paper we review the recent literature to provide guidance to researchers and practitioners on the design of explainable AI systems for the health-care domain and contribute to formalization of the field of explainable AI. We argue the reason to demand explainability determines what should be explained as this determines the relative importance of the properties of explainability (i.e. interpretability and fidelity). Based on this, we give concrete recommendations to choose between classes of explainable AI methods (explainable modelling versus post-hoc explanation; model-based, attribution-based, or example-based explanations; global and local explanations). Furthermore, we find that quantitative evaluation metrics, which are important for objective standardized evaluation, are still lacking for some properties (e.g. clarity) and types of explanators (e.g. example-based methods). We conclude that explainable modelling can contribute to trustworthy AI, but recognize that complementary measures might be needed to create trustworthy AI (e.g. reporting data quality, performing extensive (external) validation, and regulation).
Federated Learning (FL) is a recent approach for collaboratively training Machine Learning models on mobile edge devices, without private user data leaving the devices. The popular FL algorithm, Federated Averaging (FedAvg), suffers from poor convergence speed given non-iid user data. Furthermore, most existing work on FedAvg measures central-model accuracy, but in many cases, such as user content-recommendation, improving individual User model Accuracy (UA) is the real objective. To address these issues, we propose a Multi-Task Federated Learning (MTFL) system, which converges faster than FedAvg by using distributed Adam optimization (FedAdam), and benefits UA by introducing personal, non-federated 'patch' Batch-Normalization (BN) layers into the model. Testing FedAdam on the MNIST and CIFAR10 datasets show that it converges faster (up to 5x) than FedAvg in non-iid scenarios, and experiments using MTFL on the CIFAR10 dataset show that MTFL significantly improves average UA over FedAvg, by up to 54%. We also analyse the affect that private BN patches have on the MTFL model during inference, and give evidence that MTFL strikes a better balance between regularization and convergence in FL. Finally, we test the MTFL system on a mobile edge computing testbed, showing that MTFL's convergence and UA benefits outweigh its overhead.
Neural network models are widely used in solving many challenging problems, such as computer vision, personalized recommendation, and natural language processing. Those models are very computationally intensive and reach the hardware limit of the existing server and IoT devices. Thus, finding better model architectures with much less amount of computation while maximally preserving the accuracy is a popular research topic. Among various mechanisms that aim to reduce the computation complexity, identifying the zero values in the model weights and in the activations to avoid computing them is a promising direction. In this paper, we summarize the current status of the research on the computation of sparse neural networks, from the perspective of the sparse algorithms, the software frameworks, and the hardware accelerations. We observe that the search for the sparse structure can be a general methodology for high-quality model explorations, in addition to a strategy for high-efficiency model execution. We discuss the model accuracy influenced by the number of weight parameters and the structure of the model. The corresponding models are called to be located in the weight dominated and structure dominated regions, respectively. We show that for practically complicated problems, it is more beneficial to search large and sparse models in the weight dominated region. In order to achieve the goal, new approaches are required to search for proper sparse structures, and new sparse training hardware needs to be developed to facilitate fast iterations of sparse models.
It is well known that the historical logs are used for evaluating and learning policies in interactive systems, e.g. recommendation, search, and online advertising. Since direct online policy learning usually harms user experiences, it is more crucial to apply off-policy learning in real-world applications instead. Though there have been some existing works, most are focusing on learning with one single historical policy. However, in practice, usually a number of parallel experiments, e.g. multiple AB tests, are performed simultaneously. To make full use of such historical data, learning policies from multiple loggers becomes necessary. Motivated by this, in this paper, we investigate off-policy learning when the training data coming from multiple historical policies. Specifically, policies, e.g. neural networks, can be learned directly from multi-logger data, with counterfactual estimators. In order to understand the generalization ability of such estimator better, we conduct generalization error analysis for the empirical risk minimization problem. We then introduce the generalization error bound as the new risk function, which can be reduced to a constrained optimization problem. Finally, we give the corresponding learning algorithm for the new constrained problem, where we can appeal to the minimax problems to control the constraints. Extensive experiments on benchmark datasets demonstrate that the proposed methods achieve better performances than the state-of-the-arts.
Underwater images play a key role in ocean exploration, but often suffer from severe quality degradation due to light absorption and scattering in water medium. Although major breakthroughs have been made recently in the general area of image enhancement and restoration, the applicability of new methods for improving the quality of underwater images has not specifically been captured. In this paper, we review the image enhancement and restoration methods that tackle typical underwater image impairments, including some extreme degradations and distortions. Firstly, we introduce the key causes of quality reduction in underwater images, in terms of the underwater image formation model (IFM). Then, we review underwater restoration methods, considering both the IFM-free and the IFM-based approaches. Next, we present an experimental-based comparative evaluation of state-of-the-art IFM-free and IFM-based methods, considering also the prior-based parameter estimation algorithms of the IFM-based methods, using both subjective and objective analysis (the used code is freely available at https://github.com/wangyanckxx/Single-Underwater-Image-Enhancement-and-Color-Restoration). Starting from this study, we pinpoint the key shortcomings of existing methods, drawing recommendations for future research in this area. Our review of underwater image enhancement and restoration provides researchers with the necessary background to appreciate challenges and opportunities in this important field.
Matrix factorization (MF) has been widely used to discover the low-rank structure and to predict the missing entries of data matrix. In many real-world learning systems, the data matrix can be very high-dimensional but sparse. This poses an imbalanced learning problem, since the scale of missing entries is usually much larger than that of observed entries, but they cannot be ignored due to the valuable negative signal. For efficiency concern, existing work typically applies a uniform weight on missing entries to allow a fast learning algorithm. However, this simplification will decrease modeling fidelity, resulting in suboptimal performance for downstream applications. In this work, we weight the missing data non-uniformly, and more generically, we allow any weighting strategy on the missing data. To address the efficiency challenge, we propose a fast learning method, for which the time complexity is determined by the number of observed entries in the data matrix, rather than the matrix size. The key idea is two-fold: 1) we apply truncated SVD on the weight matrix to get a more compact representation of the weights, and 2) we learn MF parameters with element-wise alternating least squares (eALS) and memorize the key intermediate variables to avoid repeating computations that are unnecessary. We conduct extensive experiments on two recommendation benchmarks, demonstrating the correctness, efficiency, and effectiveness of our fast eALS method.