Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Miguel Ballesteros

On the Evolution of Syntactic Information Encoded by BERT's Contextualized Representations

Feb 10, 2021

Laura Pérez-Mayos, Roberto Carlini, Miguel Ballesteros, Leo Wanner

Figure 1 for On the Evolution of Syntactic Information Encoded by BERT's Contextualized Representations

Figure 2 for On the Evolution of Syntactic Information Encoded by BERT's Contextualized Representations

Figure 3 for On the Evolution of Syntactic Information Encoded by BERT's Contextualized Representations

Figure 4 for On the Evolution of Syntactic Information Encoded by BERT's Contextualized Representations

Abstract:The adaptation of pretrained language models to solve supervised tasks has become a baseline in NLP, and many recent works have focused on studying how linguistic information is encoded in the pretrained sentence representations. Among other information, it has been shown that entire syntax trees are implicitly embedded in the geometry of such models. As these models are often fine-tuned, it becomes increasingly important to understand how the encoded knowledge evolves along the fine-tuning. In this paper, we analyze the evolution of the embedded syntax trees along the fine-tuning process of BERT for six different tasks, covering all levels of the linguistic structure. Experimental results show that the encoded syntactic information is forgotten (PoS tagging), reinforced (dependency and constituency parsing) or preserved (semantics-related tasks) in different ways along the fine-tuning process depending on the task.

Via

Access Paper or Ask Questions

Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

Jan 26, 2021

Kailash Karthik Saravanakumar, Miguel Ballesteros, Muthu Kumar Chandrasekaran, Kathleen McKeown

Figure 1 for Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

Figure 2 for Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

Figure 3 for Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

Figure 4 for Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

Abstract:We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm. Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these multiple representations and makes the clustering decision using a neural classifier. The weighted document-cluster similarity model is learned using a novel adaptation of the triplet loss into a linear classification objective. We show that the use of a suitable fine-tuning objective and external knowledge in pre-trained transformer models yields significant improvements in the effectiveness of contextual embeddings for clustering. Our model achieves a new state-of-the-art on a standard stream clustering dataset of English documents.

* To appear in Proceedings of The 16th Conference of the European Chapter of the Association for Computational Linguistics

Via

Access Paper or Ask Questions

To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

Oct 27, 2020

Kasturi Bhattacharjee, Miguel Ballesteros, Rishita Anubhai, Smaranda Muresan, Jie Ma, Faisal Ladhak, Yaser Al-Onaizan

Figure 1 for To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

Figure 2 for To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

Figure 3 for To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

Figure 4 for To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

Abstract:Leveraging large amounts of unlabeled data using Transformer-like architectures, like BERT, has gained popularity in recent times owing to their effectiveness in learning general representations that can then be further fine-tuned for downstream tasks to much success. However, training these models can be costly both from an economic and environmental standpoint. In this work, we investigate how to effectively use unlabeled data: by exploring the task-specific semi-supervised approach, Cross-View Training (CVT) and comparing it with task-agnostic BERT in multiple settings that include domain and task relevant English data. CVT uses a much lighter model architecture and we show that it achieves similar performance to BERT on a set of sequence tagging tasks, with lesser financial and environmental impact.

* Accepted in the Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)(https://2020.emnlp.org/papers/main)

Via

Access Paper or Ask Questions

Linking Entities to Unseen Knowledge Bases with Arbitrary Schemas

Oct 21, 2020

Yogarshi Vyas, Miguel Ballesteros

Figure 1 for Linking Entities to Unseen Knowledge Bases with Arbitrary Schemas

Figure 2 for Linking Entities to Unseen Knowledge Bases with Arbitrary Schemas

Figure 3 for Linking Entities to Unseen Knowledge Bases with Arbitrary Schemas

Figure 4 for Linking Entities to Unseen Knowledge Bases with Arbitrary Schemas

Abstract:In entity linking, mentions of named entities in raw text are disambiguated against a knowledge base (KB). This work focuses on linking to unseen KBs that do not have training data and whose schema is unknown during training. Our approach relies on methods to flexibly convert entities from arbitrary KBs with several attribute-value pairs into flat strings, which we use in conjunction with state-of-the-art models for zero-shot linking. To improve the generalization of our model, we use two regularization schemes based on shuffling of entity attributes and handling of unseen attributes. Experiments on English datasets where models are trained on the CoNLL dataset, and tested on the TAC-KBP 2010 dataset show that our models outperform baseline models by over 12 points of accuracy. Unlike prior work, our approach also allows for seamlessly combining multiple training datasets. We test this ability by adding both a completely different dataset (Wikia), as well as increasing amount of training data from the TAC-KBP 2010 training set. Our models perform favorably across the board.

Via

Access Paper or Ask Questions

Transition-based Parsing with Stack-Transformers

Oct 20, 2020

Ramon Fernandez Astudillo, Miguel Ballesteros, Tahira Naseem, Austin Blodgett, Radu Florian

Figure 1 for Transition-based Parsing with Stack-Transformers

Figure 2 for Transition-based Parsing with Stack-Transformers

Figure 3 for Transition-based Parsing with Stack-Transformers

Figure 4 for Transition-based Parsing with Stack-Transformers

Abstract:Modeling the parser state is key to good performance in transition-based parsing. Recurrent Neural Networks considerably improved the performance of transition-based systems by modelling the global state, e.g. stack-LSTM parsers, or local state modeling of contextualized features, e.g. Bi-LSTM parsers. Given the success of Transformer architectures in recent parsing systems, this work explores modifications of the sequence-to-sequence Transformer architecture to model either global or local parser states in transition-based parsing. We show that modifications of the cross attention mechanism of the Transformer considerably strengthen performance both on dependency and Abstract Meaning Representation (AMR) parsing tasks, particularly for smaller models or limited training data.

* Accepted to Findings of EMNLP2020, open review https://openreview.net/forum?id=b36spsuUAde, code https://github.com/IBM/transition-amr-parser

Via

Access Paper or Ask Questions

Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Oct 12, 2020

Ethan Wilcox, Peng Qian, Richard Futrell, Ryosuke Kohita, Roger Levy, Miguel Ballesteros

Figure 1 for Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Figure 2 for Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Figure 3 for Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Figure 4 for Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Abstract:Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts. We assess the ability of modern neural language models to reproduce this behavior in English and evaluate the effect of structural supervision on learning outcomes. First, we assess few-shot learning capabilities by developing controlled experiments that probe models' syntactic nominal number and verbal argument structure generalizations for tokens seen as few as two times during training. Second, we assess invariance properties of learned representation: the ability of a model to transfer syntactic generalizations from a base context (e.g., a simple declarative active-voice sentence) to a transformed context (e.g., an interrogative sentence). We test four models trained on the same dataset: an n-gram baseline, an LSTM, and two LSTM-variants trained with explicit structural supervision (Dyer et al.,2016; Charniak et al., 2016). We find that in most cases, the neural models are able to induce the proper syntactic generalizations after minimal exposure, often from just two examples during training, and that the two structurally supervised models generalize more accurately than the LSTM model. All neural models are able to leverage information learned in base contexts to drive expectations in transformed contexts, indicating that they have learned some invariance properties of syntax.

* To appear at EMNLP 2020

Via

Access Paper or Ask Questions

Resource-Enhanced Neural Model for Event Argument Extraction

Oct 06, 2020

Jie Ma, Shuai Wang, Rishita Anubhai, Miguel Ballesteros, Yaser Al-Onaizan

Figure 1 for Resource-Enhanced Neural Model for Event Argument Extraction

Figure 2 for Resource-Enhanced Neural Model for Event Argument Extraction

Figure 3 for Resource-Enhanced Neural Model for Event Argument Extraction

Figure 4 for Resource-Enhanced Neural Model for Event Argument Extraction

Abstract:Event argument extraction (EAE) aims to identify the arguments of an event and classify the roles that those arguments play. Despite great efforts made in prior work, there remain many challenges: (1) Data scarcity. (2) Capturing the long-range dependency, specifically, the connection between an event trigger and a distant event argument. (3) Integrating event trigger information into candidate argument representation. For (1), we explore using unlabeled data in different ways. For (2), we propose to use a syntax-attending Transformer that can utilize dependency parses to guide the attention mechanism. For (3), we propose a trigger-aware sequence encoder with several types of trigger-dependent sequence representations. We also support argument extraction either from text annotated with gold entities or from plain text. Experiments on the English ACE2005 benchmark show that our approach achieves a new state-of-the-art.

* Findings of EMNLP 2020

Via

Access Paper or Ask Questions

Severing the Edge Between Before and After: Neural Architectures for Temporal Ordering of Events

Apr 08, 2020

Miguel Ballesteros, Rishita Anubhai, Shuai Wang, Nima Pourdamghani, Yogarshi Vyas, Jie Ma, Parminder Bhatia, Kathleen McKeown, Yaser Al-Onaizan

Figure 1 for Severing the Edge Between Before and After: Neural Architectures for Temporal Ordering of Events

Abstract:In this paper, we propose a neural architecture and a set of training methods for ordering events by predicting temporal relations. Our proposed models receive a pair of events within a span of text as input and they identify temporal relations (Before, After, Equal, Vague) between them. Given that a key challenge with this task is the scarcity of annotated data, our models rely on either pretrained representations (i.e. RoBERTa, BERT or ELMo), transfer and multi-task learning (by leveraging complementary datasets), and self-training techniques. Experiments on the MATRES dataset of English documents establish a new state-of-the-art on this task.

Via

Access Paper or Ask Questions

Transition-Based Dependency Parsing using Perceptron Learner

Jan 28, 2020

Rahul Radhakrishnan Iyer, Miguel Ballesteros, Chris Dyer, Robert Frederking

Figure 1 for Transition-Based Dependency Parsing using Perceptron Learner

Figure 2 for Transition-Based Dependency Parsing using Perceptron Learner

Figure 3 for Transition-Based Dependency Parsing using Perceptron Learner

Figure 4 for Transition-Based Dependency Parsing using Perceptron Learner

Abstract:Syntactic parsing using dependency structures has become a standard technique in natural language processing with many different parsing models, in particular data-driven models that can be trained on syntactically annotated corpora. In this paper, we tackle transition-based dependency parsing using a Perceptron Learner. Our proposed model, which adds more relevant features to the Perceptron Learner, outperforms a baseline arc-standard parser. We beat the UAS of the MALT and LSTM parsers. We also give possible ways to address parsing of non-projective trees.

* This was part of an assignment at my graduate course at LTI. This does not offer any major novelties

Via

Access Paper or Ask Questions

Rewarding Smatch: Transition-Based AMR Parsing with Reinforcement Learning

May 31, 2019

Tahira Naseem, Abhishek Shah, Hui Wan, Radu Florian, Salim Roukos, Miguel Ballesteros

Figure 1 for Rewarding Smatch: Transition-Based AMR Parsing with Reinforcement Learning

Abstract:Our work involves enriching the Stack-LSTM transition-based AMR parser (Ballesteros and Al-Onaizan, 2017) by augmenting training with Policy Learning and rewarding the Smatch score of sampled graphs. In addition, we also combined several AMR-to-text alignments with an attention mechanism and we supplemented the parser with pre-processed concept identification, named entities and contextualized embeddings. We achieve a highly competitive performance that is comparable to the best published results. We show an in-depth study ablating each of the new components of the parser

* Accepted as short paper at ACL 2019

Via

Access Paper or Ask Questions