Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

MMCoVaR: Multimodal COVID-19 Vaccine Focused Data Repository for Fake News Detection and a Baseline Architecture for Classification

Sep 23, 2021
Mingxuan Chen, Xinqiao Chu, K. P. Subbalakshmi

The outbreak of COVID-19 has resulted in an "infodemic" that has encouraged the propagation of misinformation about COVID-19 and cure methods which, in turn, could negatively affect the adoption of recommended public health measures in the larger population. In this paper, we provide a new multimodal (consisting of images, text and temporal information) labeled dataset containing news articles and tweets on the COVID-19 vaccine. We collected 2,593 news articles from 80 publishers for one year between Feb 16th 2020 to May 8th 2021 and 24184 Twitter posts (collected between April 17th 2021 to May 8th 2021). We combine ratings from two news media ranking sites: Medias Bias Chart and Media Bias/Fact Check (MBFC) to classify the news dataset into two levels of credibility: reliable and unreliable. The combination of two filters allows for higher precision of labeling. We also propose a stance detection mechanism to annotate tweets into three levels of credibility: reliable, unreliable and inconclusive. We provide several statistics as well as other analytics like, publisher distribution, publication date distribution, topic analysis, etc. We also provide a novel architecture that classifies the news data into misinformation or truth to provide a baseline performance for this dataset. We find that the proposed architecture has an F-Score of 0.919 and accuracy of 0.882 for fake news detection. Furthermore, we provide benchmark performance for misinformation detection on tweet dataset. This new multimodal dataset can be used in research on COVID-19 vaccine, including misinformation detection, influence of fake COVID-19 vaccine information, etc.

* 12 pages, 8 figures. This paper has been accepted for publication in ASONAM 2021 
Access Paper or Ask Questions

Affect Analysis in-the-wild: Valence-Arousal, Expressions, Action Units and a Unified Framework

Mar 29, 2021
Dimitrios Kollias, Stefanos Zafeiriou

Affect recognition based on subjects' facial expressions has been a topic of major research in the attempt to generate machines that can understand the way subjects feel, act and react. In the past, due to the unavailability of large amounts of data captured in real-life situations, research has mainly focused on controlled environments. However, recently, social media and platforms have been widely used. Moreover, deep learning has emerged as a means to solve visual analysis and recognition problems. This paper exploits these advances and presents significant contributions for affect analysis and recognition in-the-wild. Affect analysis and recognition can be seen as a dual knowledge generation problem, involving: i) creation of new, large and rich in-the-wild databases and ii) design and training of novel deep neural architectures that are able to analyse affect over these databases and to successfully generalise their performance on other datasets. The paper focuses on large in-the-wild databases, i.e., Aff-Wild and Aff-Wild2 and presents the design of two classes of deep neural networks trained with these databases. The first class refers to uni-task affect recognition, focusing on prediction of the valence and arousal dimensional variables. The second class refers to estimation of all main behavior tasks, i.e. valence-arousal prediction; categorical emotion classification in seven basic facial expressions; facial Action Unit detection. A novel multi-task and holistic framework is presented which is able to jointly learn and effectively generalize and perform affect recognition over all existing in-the-wild databases. Large experimental studies illustrate the achieved performance improvement over the existing state-of-the-art in affect recognition.

Access Paper or Ask Questions

AbdomenCT-1K: Is Abdominal Organ Segmentation A Solved Problem?

Oct 28, 2020
Jun Ma, Yao Zhang, Song Gu, Yichi Zhang, Cheng Zhu, Qiyuan Wang, Xin Liu, Xingle An, Cheng Ge, Shucheng Cao, Qi Zhang, Shangqing Liu, Yunpeng Wang, Yuhui Li, Congcong Wang, Jian He, Xiaoping Yang

With the unprecedented developments in deep learning, automatic segmentation of main abdominal organs (i.e., liver, kidney, and spleen) seems to be a solved problem as the state-of-the-art (SOTA) methods have achieved comparable results with inter-observer variability on existing benchmark datasets. However, most of the existing abdominal organ segmentation benchmark datasets only contain single-center, single-phase, single-vendor, or single-disease cases, thus, it is unclear whether the excellent performance can generalize on more diverse datasets. In this paper, we present a large and diverse abdominal CT organ segmentation dataset, termed as AbdomenCT-1K, with more than 1000 (1K) CT scans from 11 countries, including multi-center, multi-phase, multi-vendor, and multi-disease cases. Furthermore, we conduct a large-scale study for liver, kidney, spleen, and pancreas segmentation, as well as reveal the unsolved segmentation problems of the SOTA method, such as the limited generalization ability on distinct medical centers, phases, and unseen diseases. To advance the unsolved problems, we build four organ segmentation benchmarks for fully supervised, semi-supervised, weakly supervised, and continual learning, which are currently challenging and active research topics. Accordingly, we develop a simple and effective method for each benchmark, which can be used as out-of-the-box methods and strong baselines. We believe the introduction of the AbdomenCT-1K dataset will promote future in-depth research towards clinical applicable abdominal organ segmentation methods. Moreover, the datasets, codes, and trained models of baseline methods will be publicly available at

* 18 pages 
Access Paper or Ask Questions

Graph Convolutional Networks Meet with High Dimensionality Reduction

Nov 07, 2019
Mustafa Coskun

Recently, Graph Convolutional Networks (GCNs) and their variants have been receiving many research interests for learning graph-related tasks. While the GCNs have been successfully applied to this problem, some caveats inherited from classical deep learning still remain as open research topics in the context of the node classification problem. One such inherited caveat is that GCNs only consider the nodes that are a few propagations away from the labeled nodes to classify them. However, taking only a few propagation steps away nodes into account defeats the purpose of using the graph topological information in the GCNs. To remedy this problem, the-state-of-the-art methods leverage the network diffusion approaches, namely personalized page rank and its variants, to fully account for the graph topology, {\em after} they use the Neural Networks in the GCNs. However, these approaches overlook the fact that the network diffusion methods favour high degree nodes in the graph, resulting in the propagation of labels to unlabeled centralized, hub, nodes. To address this biasing hub nodes problem, in this paper, we propose to utilize a dimensionality reduction technique conjugate with personalized page rank so that we can both take advantage from graph topology and resolve the hub node favouring problem for GCNs. Here, our approach opens a new holistic road for message passing phase of GCNs by suggesting the usage of other proximity matrices instead of well-known Laplacian. Testing on two real-world networks that are commonly used in benchmarking GCNs' performance for the node classification context, we systematically evaluate the performance of the proposed methodology and show that our approach outperforms existing methods for wide ranges of parameter values with very limited deep learning training {\em epochs}.

Access Paper or Ask Questions

Learning Representations of Social Media Users

Dec 02, 2018
Adrian Benton

User representations are routinely used in recommendation systems by platform developers, targeted advertisements by marketers, and by public policy researchers to gauge public opinion across demographic groups. Computer scientists consider the problem of inferring user representations more abstractly; how does one extract a stable user representation - effective for many downstream tasks - from a medium as noisy and complicated as social media? The quality of a user representation is ultimately task-dependent (e.g. does it improve classifier performance, make more accurate recommendations in a recommendation system) but there are proxies that are less sensitive to the specific task. Is the representation predictive of latent properties such as a person's demographic features, socioeconomic class, or mental health state? Is it predictive of the user's future behavior? In this thesis, we begin by showing how user representations can be learned from multiple types of user behavior on social media. We apply several extensions of generalized canonical correlation analysis to learn these representations and evaluate them at three tasks: predicting future hashtag mentions, friending behavior, and demographic features. We then show how user features can be employed as distant supervision to improve topic model fit. Finally, we show how user features can be integrated into and improve existing classifiers in the multitask learning framework. We treat user representations - ground truth gender and mental health features - as auxiliary tasks to improve mental health state prediction. We also use distributed user representations learned in the first chapter to improve tweet-level stance classifiers, showing that distant user information can inform classification tasks at the granularity of a single message.

* PhD thesis 
Access Paper or Ask Questions

Proceedings of the IJCAI 2017 Workshop on Learning in the Presence of Class Imbalance and Concept Drift (LPCICD'17)

Jul 28, 2017
Shuo Wang, Leandro L. Minku, Nitesh Chawla, Xin Yao

With the wide application of machine learning algorithms to the real world, class imbalance and concept drift have become crucial learning issues. Class imbalance happens when the data categories are not equally represented, i.e., at least one category is minority compared to other categories. It can cause learning bias towards the majority class and poor generalization. Concept drift is a change in the underlying distribution of the problem, and is a significant issue specially when learning from data streams. It requires learners to be adaptive to dynamic changes. Class imbalance and concept drift can significantly hinder predictive performance, and the problem becomes particularly challenging when they occur simultaneously. This challenge arises from the fact that one problem can affect the treatment of the other. For example, drift detection algorithms based on the traditional classification error may be sensitive to the imbalanced degree and become less effective; and class imbalance techniques need to be adaptive to changing imbalance rates, otherwise the class receiving the preferential treatment may not be the correct minority class at the current moment. Therefore, the mutual effect of class imbalance and concept drift should be considered during algorithm design. The aim of this workshop is to bring together researchers from the areas of class imbalance learning and concept drift in order to encourage discussions and new collaborations on solving the combined issue of class imbalance and concept drift. It provides a forum for international researchers and practitioners to share and discuss their original work on addressing new challenges and research issues in class imbalance learning, concept drift, and the combined issues of class imbalance and concept drift. The proceedings include 8 papers on these topics.

Access Paper or Ask Questions

Semantic Folding Theory And its Application in Semantic Fingerprinting

Mar 16, 2016
Francisco De Sousa Webber

Human language is recognized as a very complex domain since decades. No computer system has been able to reach human levels of performance so far. The only known computational system capable of proper language processing is the human brain. While we gather more and more data about the brain, its fundamental computational processes still remain obscure. The lack of a sound computational brain theory also prevents the fundamental understanding of Natural Language Processing. As always when science lacks a theoretical foundation, statistical modeling is applied to accommodate as many sampled real-world data as possible. An unsolved fundamental issue is the actual representation of language (data) within the brain, denoted as the Representational Problem. Starting with Jeff Hawkins' Hierarchical Temporal Memory (HTM) theory, a consistent computational theory of the human cortex, we have developed a corresponding theory of language data representation: The Semantic Folding Theory. The process of encoding words, by using a topographic semantic space as distributional reference frame into a sparse binary representational vector is called Semantic Folding and is the central topic of this document. Semantic Folding describes a method of converting language from its symbolic representation (text) into an explicit, semantically grounded representation that can be generically processed by Hawkins' HTM networks. As it turned out, this change in representation, by itself, can solve many complex NLP problems by applying Boolean operators and a generic similarity function like the Euclidian Distance. Many practical problems of statistical NLP systems, like the high cost of computation, the fundamental incongruity of precision and recall , the complex tuning procedures etc., can be elegantly overcome by applying Semantic Folding.

* 59 pages, white paper 
Access Paper or Ask Questions

A convex pseudo-likelihood framework for high dimensional partial correlation estimation with convergence guarantees

Aug 14, 2014
Kshitij Khare, Sang-Yun Oh, Bala Rajaratnam

Sparse high dimensional graphical model selection is a topic of much interest in modern day statistics. A popular approach is to apply l1-penalties to either (1) parametric likelihoods, or, (2) regularized regression/pseudo-likelihoods, with the latter having the distinct advantage that they do not explicitly assume Gaussianity. As none of the popular methods proposed for solving pseudo-likelihood based objective functions have provable convergence guarantees, it is not clear if corresponding estimators exist or are even computable, or if they actually yield correct partial correlation graphs. This paper proposes a new pseudo-likelihood based graphical model selection method that aims to overcome some of the shortcomings of current methods, but at the same time retain all their respective strengths. In particular, we introduce a novel framework that leads to a convex formulation of the partial covariance regression graph problem, resulting in an objective function comprised of quadratic forms. The objective is then optimized via a coordinate-wise approach. The specific functional form of the objective function facilitates rigorous convergence analysis leading to convergence guarantees; an important property that cannot be established using standard results, when the dimension is larger than the sample size, as is often the case in high dimensional applications. These convergence guarantees ensure that estimators are well-defined under very general conditions, and are always computable. In addition, the approach yields estimators that have good large sample properties and also respect symmetry. Furthermore, application to simulated/real data, timing comparisons and numerical convergence is demonstrated. We also present a novel unifying framework that places all graphical pseudo-likelihood methods as special cases of a more general formulation, leading to important insights.

Access Paper or Ask Questions

Deep Learning in Multimodal Remote Sensing Data Fusion: A Comprehensive Review

May 03, 2022
Jiaxin Li, Danfeng Hong, Lianru Gao, Jing Yao, Ke Zheng, Bing Zhang, Jocelyn Chanussot

With the extremely rapid advances in remote sensing (RS) technology, a great quantity of Earth observation (EO) data featuring considerable and complicated heterogeneity is readily available nowadays, which renders researchers an opportunity to tackle current geoscience applications in a fresh way. With the joint utilization of EO data, much research on multimodal RS data fusion has made tremendous progress in recent years, yet these developed traditional algorithms inevitably meet the performance bottleneck due to the lack of the ability to comprehensively analyse and interpret these strongly heterogeneous data. Hence, this non-negligible limitation further arouses an intense demand for an alternative tool with powerful processing competence. Deep learning (DL), as a cutting-edge technology, has witnessed remarkable breakthroughs in numerous computer vision tasks owing to its impressive ability in data representation and reconstruction. Naturally, it has been successfully applied to the field of multimodal RS data fusion, yielding great improvement compared with traditional methods. This survey aims to present a systematic overview in DL-based multimodal RS data fusion. More specifically, some essential knowledge about this topic is first given. Subsequently, a literature survey is conducted to analyse the trends of this field. Some prevalent sub-fields in the multimodal RS data fusion are then reviewed in terms of the to-be-fused data modalities, i.e., spatiospectral, spatiotemporal, light detection and ranging-optical, synthetic aperture radar-optical, and RS-Geospatial Big Data fusion. Furthermore, We collect and summarize some valuable resources for the sake of the development in multimodal RS data fusion. Finally, the remaining challenges and potential future directions are highlighted.

Access Paper or Ask Questions

Does Configuration Encoding Matter in Learning Software Performance? An Empirical Study on Encoding Schemes

Apr 01, 2022
Jingzhi Gong, Tao Chen

Learning and predicting the performance of a configurable software system helps to provide better quality assurance. One important engineering decision therein is how to encode the configuration into the model built. Despite the presence of different encoding schemes, there is still little understanding of which is better and under what circumstances, as the community often relies on some general beliefs that inform the decision in an ad-hoc manner. To bridge this gap, in this paper, we empirically compared the widely used encoding schemes for software performance learning, namely label, scaled label, and one-hot encoding. The study covers five systems, seven models, and three encoding schemes, leading to 105 cases of investigation. Our key findings reveal that: (1) conducting trial-and-error to find the best encoding scheme in a case by case manner can be rather expensive, requiring up to 400+ hours on some models and systems; (2) the one-hot encoding often leads to the most accurate results while the scaled label encoding is generally weak on accuracy over different models; (3) conversely, the scaled label encoding tends to result in the fastest training time across the models/systems while the one-hot encoding is the slowest; (4) for all models studied, label and scaled label encoding often lead to relatively less biased outcomes between accuracy and training time, but the paired model varies according to the system. We discuss the actionable suggestions derived from our findings, hoping to provide a better understanding of this topic for the community. To promote open science, the data and code of this work can be publicly accessed at

* 13 pages, accepted by MSR2022 
Access Paper or Ask Questions