Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Do You Think It's Biased? How To Ask For The Perception Of Media Bias

Dec 16, 2021
Timo Spinde, Christina Kreuter, Wolfgang Gaissmaier, Felix Hamborg, Bela Gipp, Helge Giese

Media coverage possesses a substantial effect on the public perception of events. The way media frames events can significantly alter the beliefs and perceptions of our society. Nevertheless, nearly all media outlets are known to report news in a biased way. While such bias can be introduced by altering the word choice or omitting information, the perception of bias also varies largely depending on a reader's personal background. Therefore, media bias is a very complex construct to identify and analyze. Even though media bias has been the subject of many studies, previous assessment strategies are oversimplified, lack overlap and empirical evaluation. Thus, this study aims to develop a scale that can be used as a reliable standard to evaluate article bias. To name an example: Intending to measure bias in a news article, should we ask, "How biased is the article?" or should we instead ask, "How did the article treat the American president?". We conducted a literature search to find 824 relevant questions about text perception in previous research on the topic. In a multi-iterative process, we summarized and condensed these questions semantically to conclude a complete and representative set of possible question types about bias. The final set consisted of 25 questions with varying answering formats, 17 questions using semantic differentials, and six ratings of feelings. We tested each of the questions on 190 articles with overall 663 participants to identify how well the questions measure an article's perceived bias. Our results show that 21 final items are suitable and reliable for measuring the perception of media bias. We publish the final set of questions on http://bias-question-tree.gipplab.org/.


  Access Paper or Ask Questions

Out-of-Distribution Detection of Melanoma using Normalizing Flows

Mar 23, 2021
M. M. A. Valiuddin, C. G. A. Viviers

Generative modelling has been a topic at the forefront of machine learning research for a substantial amount of time. With the recent success in the field of machine learning, especially in deep learning, there has been an increased interest in explainable and interpretable machine learning. The ability to model distributions and provide insight in the density estimation and exact data likelihood is an example of such a feature. Normalizing Flows (NFs), a relatively new research field of generative modelling, has received substantial attention since it is able to do exactly this at a relatively low cost whilst enabling competitive generative results. While the generative abilities of NFs are typically explored, we focus on exploring the data distribution modelling for Out-of-Distribution (OOD) detection. Using one of the state-of-the-art NF models, GLOW, we attempt to detect OOD examples in the ISIC dataset. We notice that this model under performs in conform related research. To improve the OOD detection, we explore the masking methods to inhibit co-adaptation of the coupling layers however find no substantial improvement. Furthermore, we utilize Wavelet Flow which uses wavelets that can filter particular frequency components, thus simplifying the modeling process to data-driven conditional wavelet coefficients instead of complete images. This enables us to efficiently model larger resolution images in the hopes that it would capture more relevant features for OOD. The paper that introduced Wavelet Flow mainly focuses on its ability of sampling high resolution images and did not treat OOD detection. We present the results and propose several ideas for improvement such as controlling frequency components, using different wavelets and using other state-of-the-art NF architectures.


  Access Paper or Ask Questions

EPIC30M: An Epidemics Corpus Of Over 30 Million Relevant Tweets

Jun 22, 2020
Junhua Liu, Trisha Singhal, Lucienne T. M. Blessing, Kristin L. Wood, Kwan Hui Lim

Since the start of COVID-19, several relevant corpora from various sources are presented in the literature that contain millions of data points. While these corpora are valuable in supporting many analyses on this specific pandemic, researchers require additional benchmark corpora that contain other epidemics to facilitate cross-epidemic pattern recognition and trend analysis tasks. During our other efforts on COVID-19 related work, we discover very little disease related corpora in the literature that are sizable and rich enough to support such cross-epidemic analysis tasks. In this paper, we present EPIC30M, a large-scale epidemic corpus that contains 30 millions micro-blog posts, i.e., tweets crawled from Twitter, from year 2006 to 2020. EPIC30M contains a subset of 26.2 millions tweets related to three general diseases, namely Ebola, Cholera and Swine Flu, and another subset of 4.7 millions tweets of six global epidemic outbreaks, including 2009 H1N1 Swine Flu, 2010 Haiti Cholera, 2012 Middle-East Respiratory Syndrome (MERS), 2013 West African Ebola, 2016 Yemen Cholera and 2018 Kivu Ebola. Furthermore, we explore and discuss the properties of the corpus with statistics of key terms and hashtags and trends analysis for each subset. Finally, we demonstrate the value and impact that EPIC30M could create through a discussion of multiple use cases of cross-epidemic research topics that attract growing interest in recent years. These use cases span multiple research areas, such as epidemiological modeling, pattern recognition, natural language understanding and economical modeling.


  Access Paper or Ask Questions

EPIC: An Epidemics Corpus Of Over 20 Million Relevant Tweets

Jun 09, 2020
Junhua Liu, Trisha Singhal, Lucienne T. M. Blessing, Kristin L. Wood, Kwan Hui Lim

Since the start of COVID-19, several relevant corpora from various sources are presented in the literature that contain millions of data points. While these corpora are valuable in supporting many analyses on this specific pandemic, researchers require additional benchmark corpora that contain other epidemics to facilitate cross-epidemic pattern recognition and trend analysis tasks. During our other efforts on COVID-19 related work, we discover very little disease related corpora in the literature that are sizable and rich enough to support such cross-epidemic analysis tasks. In this paper, we present EPIC, a large-scale epidemic corpus that contains 20 millions micro-blog posts, i.e., tweets crawled from Twitter, from year 2006 to 2020. EPIC contains a subset of 17.8 millions tweets related to three general diseases, namely Ebola, Cholera and Swine Flu, and another subset of 3.5 millions tweets of six global epidemic outbreaks, including 2009 H1N1 Swine Flu, 2010 Haiti Cholera, 2012 Middle-East Respiratory Syndrome (MERS), 2013 West African Ebola, 2016 Yemen Cholera and 2018 Kivu Ebola. Furthermore, we explore and discuss the properties of the corpus with statistics of key terms and hashtags and trends analysis for each subset. Finally, we demonstrate the value and impact that EPIC could create through a discussion of multiple use cases of cross-epidemic research topics that attract growing interest in recent years. These use cases span multiple research areas, such as epidemiological modeling, pattern recognition, natural language understanding and economical modeling.


  Access Paper or Ask Questions

AI-based Resource Allocation: Reinforcement Learning for Adaptive Auto-scaling in Serverless Environments

May 29, 2020
Lucia Schuler, Somaya Jamil, Niklas Kühl

Serverless computing has emerged as a compelling new paradigm of cloud computing models in recent years. It promises the user services at large scale and low cost while eliminating the need for infrastructure management. On cloud provider side, flexible resource management is required to meet fluctuating demand. It can be enabled through automated provisioning and deprovisioning of resources. A common approach among both commercial and open source serverless computing platforms is workload-based auto-scaling, where a designated algorithm scales instances according to the number of incoming requests. In the recently evolving serverless framework Knative a request-based policy is proposed, where the algorithm scales resources by a configured maximum number of requests that can be processed in parallel per instance, the so-called concurrency. As we show in a baseline experiment, this predefined concurrency level can strongly influence the performance of a serverless application. However, identifying the concurrency configuration that yields the highest possible quality of service is a challenging task due to various factors, e.g. varying workload and complex infrastructure characteristics, influencing throughput and latency. While there has been considerable research into intelligent techniques for optimizing auto-scaling for virtual machine provisioning, this topic has not yet been discussed in the area of serverless computing. For this reason, we investigate the applicability of a reinforcement learning approach, which has been proven on dynamic virtual machine provisioning, to request-based auto-scaling in a serverless framework. Our results show that within a limited number of iterations our proposed model learns an effective scaling policy per workload, improving the performance compared to the default auto-scaling configuration.

* 8 pages, 7 figures 

  Access Paper or Ask Questions

SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search

May 27, 2020
Tom Hope, Jason Portenoy, Kishore Vasan, Jonathan Borchardt, Eric Horvitz, Daniel S. Weld, Marti A. Hearst, Jevin West

The COVID-19 pandemic has sparked unprecedented mobilization of scientists, already generating thousands of new papers that join a litany of previous biomedical work in related areas. This deluge of information makes it hard for researchers to keep track of their own research area, let alone explore new directions. Standard search engines are designed primarily for targeted search and are not geared for discovery or making connections that are not obvious from reading individual papers. In this paper, we present our ongoing work on SciSight, a novel framework for exploratory search of COVID-19 research. Based on formative interviews with scientists and a review of existing tools, we build and integrate two key capabilities: first, exploring interactions between biomedical facets (e.g., proteins, genes, drugs, diseases, patient characteristics); and second, discovering groups of researchers and how they are connected. We extract entities using a language model pre-trained on several biomedical information extraction tasks, and enrich them with data from the Microsoft Academic Graph (MAG). To find research groups automatically, we use hierarchical clustering with overlap to allow authors, as they do, to belong to multiple groups. Finally, we introduce a novel presentation of these groups based on both topical and social affinities, allowing users to drill down from groups to papers to associations between entities, and update query suggestions on the fly with the goal of facilitating exploratory navigation. SciSight has thus far served over 10K users with over 30K page views and 13% returning users. Preliminary user interviews with biomedical researchers suggest that SciSight complements current approaches and helps find new and relevant knowledge.

* http://scisight.apps.allenai.org/ 

  Access Paper or Ask Questions

Building A User-Centric and Content-Driven Socialbot

May 06, 2020
Hao Fang

To build Sounding Board, we develop a system architecture that is capable of accommodating dialog strategies that we designed for socialbot conversations. The architecture consists of a multi-dimensional language understanding module for analyzing user utterances, a hierarchical dialog management framework for dialog context tracking and complex dialog control, and a language generation process that realizes the response plan and makes adjustments for speech synthesis. Additionally, we construct a new knowledge base to power the socialbot by collecting social chat content from a variety of sources. An important contribution of the system is the synergy between the knowledge base and the dialog management, i.e., the use of a graph structure to organize the knowledge base that makes dialog control very efficient in bringing related content to the discussion. Using the data collected from Sounding Board during the competition, we carry out in-depth analyses of socialbot conversations and user ratings which provide valuable insights in evaluation methods for socialbots. We additionally investigate a new approach for system evaluation and diagnosis that allows scoring individual dialog segments in the conversation. Finally, observing that socialbots suffer from the issue of shallow conversations about topics associated with unstructured data, we study the problem of enabling extended socialbot conversations grounded on a document. To bring together machine reading and dialog control techniques, a graph-based document representation is proposed, together with methods for automatically constructing the graph. Using the graph-based representation, dialog control can be carried out by retrieving nodes or moving along edges in the graph. To illustrate the usage, a mixed-initiative dialog strategy is designed for socialbot conversations on news articles.

* PhD thesis 

  Access Paper or Ask Questions

When Deep Learning Meets Data Alignment: A Review on Deep Registration Networks (DRNs)

Mar 06, 2020
Victor Villena-Martinez, Sergiu Oprea, Marcelo Saval-Calvo, Jorge Azorin-Lopez, Andres Fuster-Guillo, Robert B. Fisher

Registration is the process that computes the transformation that aligns sets of data. Commonly, a registration process can be divided into four main steps: target selection, feature extraction, feature matching, and transform computation for the alignment. The accuracy of the result depends on multiple factors, the most significant are the quantity of input data, the presence of noise, outliers and occlusions, the quality of the extracted features, real-time requirements and the type of transformation, especially those ones defined by multiple parameters, like non-rigid deformations. Recent advancements in machine learning could be a turning point in these issues, particularly with the development of deep learning (DL) techniques, which are helping to improve multiple computer vision problems through an abstract understanding of the input data. In this paper, a review of deep learning-based registration methods is presented. We classify the different papers proposing a framework extracted from the traditional registration pipeline to analyse the new learning-based proposal strengths. Deep Registration Networks (DRNs) try to solve the alignment task either replacing part of the traditional pipeline with a network or fully solving the registration problem. The main conclusions extracted are, on the one hand, 1) learning-based registration techniques cannot always be clearly classified in the traditional pipeline. 2) These approaches allow more complex inputs like conceptual models as well as the traditional 3D datasets. 3) In spite of the generality of learning, the current proposals are still ad hoc solutions. Finally, 4) this is a young topic that still requires a large effort to reach general solutions able to cope with the problems that affect traditional approaches.

* Submitted to Pattern Recognition 

  Access Paper or Ask Questions

<<
602
603
604
605
606
607
608
609
610
611
612
613
614
>>