Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pawan Goyal

A Novel Two-stage Framework for Extracting Opinionated Sentences from News Articles

Jan 24, 2021
Rajkumar Pujari, Swara Desai, Niloy Ganguly, Pawan Goyal

Figure 1 for A Novel Two-stage Framework for Extracting Opinionated Sentences from News Articles

Figure 2 for A Novel Two-stage Framework for Extracting Opinionated Sentences from News Articles

Figure 3 for A Novel Two-stage Framework for Extracting Opinionated Sentences from News Articles

Figure 4 for A Novel Two-stage Framework for Extracting Opinionated Sentences from News Articles

This paper presents a novel two-stage framework to extract opinionated sentences from a given news article. In the first stage, Naive Bayes classifier by utilizing the local features assigns a score to each sentence - the score signifies the probability of the sentence to be opinionated. In the second stage, we use this prior within the HITS (Hyperlink-Induced Topic Search) schema to exploit the global structure of the article and relation between the sentences. In the HITS schema, the opinionated sentences are treated as Hubs and the facts around these opinions are treated as the Authorities. The algorithm is implemented and evaluated against a set of manually marked data. We show that using HITS significantly improves the precision over the baseline Naive Bayes classifier. We also argue that the proposed method actually discovers the underlying structure of the article, thus extracting various opinions, grouped with supporting facts as well as other supporting opinions from the article.

* Presented as a talk at TextGraphs-9: the workshop on Graph-based Methods for Natural Language Processing at EMNLP 2014

Via

Access Paper or Ask Questions

Reproducibility, Replicability and Beyond: Assessing Production Readiness of Aspect Based Sentiment Analysis in the Wild

Jan 23, 2021
Rajdeep Mukherjee, Shreyas Shetty, Subrata Chattopadhyay, Subhadeep Maji, Samik Datta, Pawan Goyal

Figure 1 for Reproducibility, Replicability and Beyond: Assessing Production Readiness of Aspect Based Sentiment Analysis in the Wild

Figure 2 for Reproducibility, Replicability and Beyond: Assessing Production Readiness of Aspect Based Sentiment Analysis in the Wild

Figure 3 for Reproducibility, Replicability and Beyond: Assessing Production Readiness of Aspect Based Sentiment Analysis in the Wild

Figure 4 for Reproducibility, Replicability and Beyond: Assessing Production Readiness of Aspect Based Sentiment Analysis in the Wild

With the exponential growth of online marketplaces and user-generated content therein, aspect-based sentiment analysis has become more important than ever. In this work, we critically review a representative sample of the models published during the past six years through the lens of a practitioner, with an eye towards deployment in production. First, our rigorous empirical evaluation reveals poor reproducibility: an average 4-5% drop in test accuracy across the sample. Second, to further bolster our confidence in empirical evaluation, we report experiments on two challenging data slices, and observe a consistent 12-55% drop in accuracy. Third, we study the possibility of transfer across domains and observe that as little as 10-25% of the domain-specific training dataset, when used in conjunction with datasets from other domains within the same locale, largely closes the gap between complete cross-domain and complete in-domain predictive performance. Lastly, we open-source two large-scale annotated review corpora from a large e-commerce portal in India in order to aid the study of replicability and transfer, with the hope that it will fuel further growth of the field.

* 12 pages, accepted at ECIR 2021

Via

Access Paper or Ask Questions

Joint Autoregressive and Graph Models for Software and Developer Social Networks

Jan 21, 2021
Rima Hazra, Hardik Aggarwal, Pawan Goyal, Animesh Mukherjee, Soumen Chakrabarti

Figure 1 for Joint Autoregressive and Graph Models for Software and Developer Social Networks

Figure 2 for Joint Autoregressive and Graph Models for Software and Developer Social Networks

Figure 3 for Joint Autoregressive and Graph Models for Software and Developer Social Networks

Social network research has focused on hyperlink graphs, bibliographic citations, friend/follow patterns, influence spread, etc. Large software repositories also form a highly valuable networked artifact, usually in the form of a collection of packages, their developers, dependencies among them, and bug reports. This "social network of code" is rarely studied by social network researchers. We introduce two new problems in this setting. These problems are well-motivated in the software engineering community but not closely studied by social network scientists. The first is to identify packages that are most likely to be troubled by bugs in the immediate future, thereby demanding the greatest attention. The second is to recommend developers to packages for the next development cycle. Simple autoregression can be applied to historical data for both problems, but we propose a novel method to integrate network-derived features and demonstrate that our method brings additional benefits. Apart from formalizing these problems and proposing new baseline approaches, we prepare and contribute a substantial dataset connecting multiple attributes built from the long-term history of 20 releases of Ubuntu, growing to over 25,000 packages with their dependency links, maintained by over 3,800 developers, with over 280k bug reports.

* Accepted at ECIR 2021

Via

Access Paper or Ask Questions

Medical Entity Linking using Triplet Network

Dec 21, 2020
Ishani Mondal, Sukannya Purkayastha, Sudeshna Sarkar, Pawan Goyal, Jitesh Pillai, Amitava Bhattacharyya, Mahanandeeshwar Gattu

Figure 1 for Medical Entity Linking using Triplet Network

Figure 2 for Medical Entity Linking using Triplet Network

Figure 3 for Medical Entity Linking using Triplet Network

Figure 4 for Medical Entity Linking using Triplet Network

Entity linking (or Normalization) is an essential task in text mining that maps the entity mentions in the medical text to standard entities in a given Knowledge Base (KB). This task is of great importance in the medical domain. It can also be used for merging different medical and clinical ontologies. In this paper, we center around the problem of disease linking or normalization. This task is executed in two phases: candidate generation and candidate scoring. In this paper, we present an approach to rank the candidate Knowledge Base entries based on their similarity with disease mention. We make use of the Triplet Network for candidate ranking. While the existing methods have used carefully generated sieves and external resources for candidate generation, we introduce a robust and portable candidate generation scheme that does not make use of the hand-crafted rules. Experimental results on the standard benchmark NCBI disease dataset demonstrate that our system outperforms the prior methods by a significant margin.

* ClinicalNLP@NAACL 2019

Via

Access Paper or Ask Questions

HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

Dec 18, 2020
Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, Animesh Mukherjee

Figure 1 for HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

Figure 2 for HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

Figure 3 for HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

Figure 4 for HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

Hate speech is a challenging issue plaguing the online social media. While better models for hate speech detection are continuously being developed, there is little research on the bias and interpretability aspects of hate speech. In this paper, we introduce HateXplain, the first benchmark hate speech dataset covering multiple aspects of the issue. Each post in our dataset is annotated from three different perspectives: the basic, commonly used 3-class classification (i.e., hate, offensive or normal), the target community (i.e., the community that has been the victim of hate speech/offensive speech in the post), and the rationales, i.e., the portions of the post on which their labelling decision (as hate, offensive or normal) is based. We utilize existing state-of-the-art models and observe that even models that perform very well in classification do not score high on explainability metrics like model plausibility and faithfulness. We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities. We have made our code and dataset public at https://github.com/punyajoy/HateXplain

* 12 pages, 7 figues, 8 tables. Accepted at AAAI 2021

Via

Access Paper or Ask Questions

Finding Prerequisite Relations between Concepts using Textbook

Nov 20, 2020
Shivam Pal, Vipul Arora, Pawan Goyal

Figure 1 for Finding Prerequisite Relations between Concepts using Textbook

Figure 2 for Finding Prerequisite Relations between Concepts using Textbook

Figure 3 for Finding Prerequisite Relations between Concepts using Textbook

Figure 4 for Finding Prerequisite Relations between Concepts using Textbook

A prerequisite is anything that you need to know or understand first before attempting to learn or understand something new. In the current work, we present a method of finding prerequisite relations between concepts using related textbooks. Previous researchers have focused on finding these relations using Wikipedia link structure through unsupervised and supervised learning approaches. In the current work, we have proposed two methods, one is statistical method and another is learning-based method. We mine the rich and structured knowledge available in the textbooks to find the content for those concepts and the order in which they are discussed. Using this information, proposed statistical method estimates explicit as well as implicit prerequisite relations between concepts. During experiments, we have found performance of proposed statistical method is better than the popular RefD method, which uses Wikipedia link structure. And proposed learning-based method has shown a significant increase in the efficiency of supervised learning method when compared with graph and text-based learning-based approaches.

Via

Access Paper or Ask Questions

Operator Inference and Physics-Informed Learning of Low-Dimensional Models for Incompressible Flows

Oct 13, 2020
Peter Benner, Pawan Goyal, Jan Heiland, Igor Pontes Duff

Figure 1 for Operator Inference and Physics-Informed Learning of Low-Dimensional Models for Incompressible Flows

Figure 2 for Operator Inference and Physics-Informed Learning of Low-Dimensional Models for Incompressible Flows

Figure 3 for Operator Inference and Physics-Informed Learning of Low-Dimensional Models for Incompressible Flows

Figure 4 for Operator Inference and Physics-Informed Learning of Low-Dimensional Models for Incompressible Flows

Reduced-order modeling has a long tradition in computational fluid dynamics. The ever-increasing significance of data for the synthesis of low-order models is well reflected in the recent successes of data-driven approaches such as Dynamic Mode Decomposition and Operator Inference. With this work, we suggest a new approach to learning structured low-order models for incompressible flow from data that can be used for engineering studies such as control, optimization, and simulation. To that end, we utilize the intrinsic structure of the Navier-Stokes equations for incompressible flows and show that learning dynamics of the velocity and pressure can be decoupled, thus leading to an efficient operator inference approach for learning the underlying dynamics of incompressible flows. Furthermore, we show the operator inference performance in learning low-order models using two benchmark problems and compare with an intrusive method, namely proper orthogonal decomposition, and other data-driven approaches.

* 23 pages, 14 figures

Via

Access Paper or Ask Questions

MatScIE: An automated tool for the generation of databases of methods and parameters used in the computational materials science literature

Sep 15, 2020
Souradip Guha, Jatin Agrawal, Swetarekha Ram, Seung-Cheol Lee, Satadeep Bhattacharjee, Pawan Goyal

Figure 1 for MatScIE: An automated tool for the generation of databases of methods and parameters used in the computational materials science literature

Figure 2 for MatScIE: An automated tool for the generation of databases of methods and parameters used in the computational materials science literature

Figure 3 for MatScIE: An automated tool for the generation of databases of methods and parameters used in the computational materials science literature

Figure 4 for MatScIE: An automated tool for the generation of databases of methods and parameters used in the computational materials science literature

The number of published articles in the field of materials science is growing rapidly every year. This comparatively unstructured data source, which contains a large amount of information, has a restriction on its re-usability, as the information needed to carry out further calculations using the data in it must be extracted manually. It is very important to obtain valid and contextually correct information from the online (offline) data, as it can be useful not only to generate inputs for further calculations, but also to incorporate them into a querying framework. Retaining this context as a priority, we have developed an automated tool, MatScIE (Material Scince Information Extractor) that can extract relevant information from material science literature and make a structured database that is much easier to use for material simulations. Specifically, we extract the material details, methods, code, parameters, and structure from the various research articles. Finally, we created a web application where users can upload published articles and view/download the information obtained from this tool and can create their own databases for their personal uses.

* 12 pages, 7 figures

Via

Access Paper or Ask Questions

Interpretable Neuroevolutionary Models for Learning Non-Differentiable Functions and Programs

Jul 16, 2020
Allan Costa, Rumen Dangovski, Samuel Kim, Pawan Goyal, Marin Soljačić, Joseph Jacobson

Figure 1 for Interpretable Neuroevolutionary Models for Learning Non-Differentiable Functions and Programs

Figure 2 for Interpretable Neuroevolutionary Models for Learning Non-Differentiable Functions and Programs

Figure 3 for Interpretable Neuroevolutionary Models for Learning Non-Differentiable Functions and Programs

Figure 4 for Interpretable Neuroevolutionary Models for Learning Non-Differentiable Functions and Programs

A key factor in the modern success of deep learning is the astonishing expressive power of neural networks. However, this comes at the cost of complex, black-boxed models that are unable to extrapolate beyond the domain of the training dataset, conflicting with goals of expressing physical laws or building human-readable programs. In this paper, we introduce OccamNet, a neural network model that can find interpretable, compact and sparse solutions for fitting data, \`{a} la Occam's razor. Our model defines a probability distribution over a non-differentiable function space, and we introduce an optimization method that samples functions and updates the weights based on cross-entropy matching in an evolutionary strategy: we train by biasing the probability mass towards better fitting solutions. We demonstrate that we can fit a variety of algorithms, ranging from simple analytic functions through recursive programs to even simple image classification. Our method takes minimal memory footprint, does not require AI accelerators for efficient training, fits complicated functions in minutes of training on a single CPU, and demonstrates significant performance gains when scaled on GPU. Our implementation, demonstrations and instructions for reproducing the experiments are available at https://github.com/AllanSCosta/occam-net.

Via

Access Paper or Ask Questions

Logic Constrained Pointer Networks for Interpretable Textual Similarity

Jul 15, 2020
Subhadeep Maji, Rohan Kumar, Manish Bansal, Kalyani Roy, Pawan Goyal

Figure 1 for Logic Constrained Pointer Networks for Interpretable Textual Similarity

Figure 2 for Logic Constrained Pointer Networks for Interpretable Textual Similarity

Figure 3 for Logic Constrained Pointer Networks for Interpretable Textual Similarity

Figure 4 for Logic Constrained Pointer Networks for Interpretable Textual Similarity

Systematically discovering semantic relationships in text is an important and extensively studied area in Natural Language Processing, with various tasks such as entailment, semantic similarity, etc. Decomposability of sentence-level scores via subsequence alignments has been proposed as a way to make models more interpretable. We study the problem of aligning components of sentences leading to an interpretable model for semantic textual similarity. In this paper, we introduce a novel pointer network based model with a sentinel gating function to align constituent chunks, which are represented using BERT. We improve this base model with a loss function to equally penalize misalignments in both sentences, ensuring the alignments are bidirectional. Finally, to guide the network with structured external knowledge, we introduce first-order logic constraints based on ConceptNet and syntactic knowledge. The model achieves an F1 score of 97.73 and 96.32 on the benchmark SemEval datasets for the chunk alignment task, showing large improvements over the existing solutions. Source code is available at https://github.com/manishb89/interpretable_sentence_similarity

Via

Access Paper or Ask Questions