Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Transforming Fake News: Robust Generalisable News Classification Using Transformers

Sep 20, 2021
Ciara Blackledge, Amir Atapour-Abarghouei

As online news has become increasingly popular and fake news increasingly prevalent, the ability to audit the veracity of online news content has become more important than ever. Such a task represents a binary classification challenge, for which transformers have achieved state-of-the-art results. Using the publicly available ISOT and Combined Corpus datasets, this study explores transformers' abilities to identify fake news, with particular attention given to investigating generalisation to unseen datasets with varying styles, topics and class distributions. Moreover, we explore the idea that opinion-based news articles cannot be classified as real or fake due to their subjective nature and often sensationalised language, and propose a novel two-step classification pipeline to remove such articles from both model training and the final deployed inference system. Experiments over the ISOT and Combined Corpus datasets show that transformers achieve an increase in F1 scores of up to 4.9% for out of distribution generalisation compared to baseline approaches, with a further increase of 10.1% following the implementation of our two-step classification pipeline. To the best of our knowledge, this study is the first to investigate generalisation of transformers in this context.

* 9 pages 

  Access Paper or Ask Questions

Making Table Understanding Work in Practice

Sep 11, 2021
Madelon Hulsebos, Sneha Gathani, James Gale, Isil Dillig, Paul Groth, Çağatay Demiralp

Understanding the semantics of tables at scale is crucial for tasks like data integration, preparation, and search. Table understanding methods aim at detecting a table's topic, semantic column types, column relations, or entities. With the rise of deep learning, powerful models have been developed for these tasks with excellent accuracy on benchmarks. However, we observe that there exists a gap between the performance of these models on these benchmarks and their applicability in practice. In this paper, we address the question: what do we need for these models to work in practice? We discuss three challenges of deploying table understanding models and propose a framework to address them. These challenges include 1) difficulty in customizing models to specific domains, 2) lack of training data for typical database tables often found in enterprises, and 3) lack of confidence in the inferences made by models. We present SigmaTyper which implements this framework for the semantic column type detection task. SigmaTyper encapsulates a hybrid model trained on GitTables and integrates a lightweight human-in-the-loop approach to customize the model. Lastly, we highlight avenues for future research that further close the gap towards making table understanding effective in practice.

* Submitted to CIDR'22 

  Access Paper or Ask Questions

COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic Convex Optimization

Sep 07, 2021
Manuel Madeira, Renato Negrinho, João Xavier, Pedro M. Q. Aguiar

First-order methods for stochastic optimization have undeniable relevance, in part due to their pivotal role in machine learning. Variance reduction for these algorithms has become an important research topic. In contrast to common approaches, which rarely leverage global models of the objective function, we exploit convexity and L-smoothness to improve the noisy estimates outputted by the stochastic gradient oracle. Our method, named COCO denoiser, is the joint maximum likelihood estimator of multiple function gradients from their noisy observations, subject to co-coercivity constraints between them. The resulting estimate is the solution of a convex Quadratically Constrained Quadratic Problem. Although this problem is expensive to solve by interior point methods, we exploit its structure to apply an accelerated first-order algorithm, the Fast Dual Proximal Gradient method. Besides analytically characterizing the proposed estimator, we show empirically that increasing the number and proximity of the queried points leads to better gradient estimates. We also apply COCO in stochastic settings by plugging it in existing algorithms, such as SGD, Adam or STRSAGA, outperforming their vanilla versions, even in scenarios where our modelling assumptions are mismatched.

* 25 pages, 14 figures 

  Access Paper or Ask Questions

Approximation Methods for Partially Observed Markov Decision Processes (POMDPs)

Aug 31, 2021
Caleb M. Bowyer

POMDPs are useful models for systems where the true underlying state is not known completely to an outside observer; the outside observer incompletely knows the true state of the system, and observes a noisy version of the true system state. When the number of system states is large in a POMDP that often necessitates the use of approximation methods to obtain near optimal solutions for control. This survey is centered around the origins, theory, and approximations of finite-state POMDPs. In order to understand POMDPs, it is required to have an understanding of finite-state Markov Decision Processes (MDPs) in \autoref{mdp} and Hidden Markov Models (HMMs) in \autoref{hmm}. For this background theory, I provide only essential details on MDPs and HMMs and leave longer expositions to textbook treatments before diving into the main topics of POMDPs. Once the required background is covered, the POMDP is introduced in \autoref{pomdp}. The origins of the POMDP are explained in the classical papers section \autoref{classical}. Once the high computational requirements are understood from the exact methodological point of view, the main approximation methods are surveyed in \autoref{approximations}. Then, I end the survey with some new research directions in \autoref{conclusion}.

  Access Paper or Ask Questions

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

Jul 06, 2021
Siddharth Karamcheti, Ranjay Krishna, Li Fei-Fei, Christopher D. Manning

Active learning promises to alleviate the massive data needs of supervised machine learning: it has successfully improved sample efficiency by an order of magnitude on traditional tasks like topic classification and object recognition. However, we uncover a striking contrast to this promise: across 5 models and 4 datasets on the task of visual question answering, a wide variety of active learning approaches fail to outperform random selection. To understand this discrepancy, we profile 8 active learning methods on a per-example basis, and identify the problem as collective outliers -- groups of examples that active learning methods prefer to acquire but models fail to learn (e.g., questions that ask about text in images or require external knowledge). Through systematic ablation experiments and qualitative visualizations, we verify that collective outliers are a general phenomenon responsible for degrading pool-based active learning. Notably, we show that active learning sample efficiency increases significantly as the number of collective outliers in the active learning pool decreases. We conclude with a discussion and prescriptive recommendations for mitigating the effects of these outliers in future work.

* Accepted at ACL-IJCNLP 2021. 17 pages, 16 Figures 

  Access Paper or Ask Questions

A Review on Explainability in Multimodal Deep Neural Nets

May 18, 2021
Gargi Joshi, Rahee Walambe, Ketan Kotecha

Artificial Intelligence techniques powered by deep neural nets have achieved much success in several application domains, most significantly and notably in the Computer Vision applications and Natural Language Processing tasks. Surpassing human-level performance propelled the research in the applications where different modalities amongst language, vision, sensory, text play an important role in accurate predictions and identification. Several multimodal fusion methods employing deep learning models are proposed in the literature. Despite their outstanding performance, the complex, opaque and black-box nature of the deep neural nets limits their social acceptance and usability. This has given rise to the quest for model interpretability and explainability, more so in the complex tasks involving multimodal AI methods. This paper extensively reviews the present literature to present a comprehensive survey and commentary on the explainability in multimodal deep neural nets, especially for the vision and language tasks. Several topics on multimodal AI and its applications for generic domains have been covered in this paper, including the significance, datasets, fundamental building blocks of the methods and techniques, challenges, applications, and future trends in this domain

* in IEEE Access, vol. 9, pp. 59800-59821, 2021 
* 24 pages 6 figures 

  Access Paper or Ask Questions

A Framework for Unsupervised Classificiation and Data Mining of Tweets about Cyber Vulnerabilities

Apr 23, 2021
Kenneth Alperin, Emily Joback, Leslie Shing, Gabe Elkin

Many cyber network defense tools rely on the National Vulnerability Database (NVD) to provide timely information on known vulnerabilities that exist within systems on a given network. However, recent studies have indicated that the NVD is not always up to date, with known vulnerabilities being discussed publicly on social media platforms, like Twitter and Reddit, months before they are published to the NVD. To that end, we present a framework for unsupervised classification to filter tweets for relevance to cyber security. We consider and evaluate two unsupervised machine learning techniques for inclusion in our framework, and show that zero-shot classification using a Bidirectional and Auto-Regressive Transformers (BART) model outperforms the other technique with 83.52% accuracy and a F1 score of 83.88, allowing for accurate filtering of tweets without human intervention or labelled data for training. Additionally, we discuss different insights that can be derived from these cyber-relevant tweets, such as trending topics of tweets and the counts of Twitter mentions for Common Vulnerabilities and Exposures (CVEs), that can be used in an alert or report to augment current NVD-based risk assessment tools.

  Access Paper or Ask Questions

Biomedical Question Answering: A Comprehensive Review

Feb 10, 2021
Qiao Jin, Zheng Yuan, Guangzhi Xiong, Qianlan Yu, Chuanqi Tan, Mosha Chen, Songfang Huang, Xiaozhong Liu, Sheng Yu

Question Answering (QA) is a benchmark Natural Language Processing (NLP) task where models predict the answer for a given question using related documents, images, knowledge bases and question-answer pairs. Automatic QA has been successfully applied in various domains like search engines and chatbots. However, for specific domains like biomedicine, QA systems are still rarely used in real-life settings. Biomedical QA (BQA), as an emerging QA task, enables innovative applications to effectively perceive, access and understand complex biomedical knowledge. In this work, we provide a critical review of recent efforts in BQA. We comprehensively investigate prior BQA approaches, which are classified into 6 major methodologies (open-domain, knowledge base, information retrieval, machine reading comprehension, question entailment and visual QA), 4 topics of contents (scientific, clinical, consumer health and examination) and 5 types of formats (yes/no, extraction, generation, multi-choice and retrieval). In the end, we highlight several key challenges of BQA and explore potential directions for future works.

* Draft 

  Access Paper or Ask Questions

This is not the Texture you are looking for! Introducing Novel Counterfactual Explanations for Non-Experts using Generative Adversarial Learning

Dec 22, 2020
Silvan Mertes, Tobias Huber, Katharina Weitz, Alexander Heimerl, Elisabeth André

With the ongoing rise of machine learning, the need for methods for explaining decisions made by artificial intelligence systems is becoming a more and more important topic. Especially for image classification tasks, many state-of-the-art tools to explain such classifiers rely on visual highlighting of important areas of the input data. Contrary, counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image in a way such that the classifier would have made a different prediction. By doing so, the users of counterfactual explanation systems are equipped with a completely different kind of explanatory information. However, methods for generating realistic counterfactual explanations for image classifiers are still rare. In this work, we present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques. Additionally, we conduct a user study to evaluate our approach in a use case which was inspired by a healthcare scenario. Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems that work with saliency maps, namely LIME and LRP.

  Access Paper or Ask Questions

Source Code Classification for Energy Efficiency in Parallel Ultra Low-Power Microcontrollers

Dec 12, 2020
Emanuele Parisi, Francesco Barchi, Andrea Bartolini, Giuseppe Tagliavini, Andrea Acquaviva

The analysis of source code through machine learning techniques is an increasingly explored research topic aiming at increasing smartness in the software toolchain to exploit modern architectures in the best possible way. In the case of low-power, parallel embedded architectures, this means finding the configuration, for instance in terms of the number of cores, leading to minimum energy consumption. Depending on the kernel to be executed, the energy optimal scaling configuration is not trivial. While recent work has focused on general-purpose systems to learn and predict the best execution target in terms of the execution time of a snippet of code or kernel (e.g. offload OpenCL kernel on multicore CPU or GPU), in this work we focus on static compile-time features to assess if they can be successfully used to predict the minimum energy configuration on PULP, an ultra-low-power architecture featuring an on-chip cluster of RISC-V processors. Experiments show that using machine learning models on the source code to select the best energy scaling configuration automatically is viable and has the potential to be used in the context of automatic system configuration for energy minimisation.

  Access Paper or Ask Questions