Alert button
Picture for Julia Hockenmaier

Julia Hockenmaier

Alert button

A Framework for Bidirectional Decoding: Case Study in Morphological Inflection

May 21, 2023
Marc E. Canby, Julia Hockenmaier

Figure 1 for A Framework for Bidirectional Decoding: Case Study in Morphological Inflection
Figure 2 for A Framework for Bidirectional Decoding: Case Study in Morphological Inflection
Figure 3 for A Framework for Bidirectional Decoding: Case Study in Morphological Inflection
Figure 4 for A Framework for Bidirectional Decoding: Case Study in Morphological Inflection

Transformer-based encoder-decoder models that generate outputs in a left-to-right fashion have become standard for sequence-to-sequence tasks. In this paper, we propose a framework for decoding that produces sequences from the "outside-in": at each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences. We argue that this is more principled than prior bidirectional decoders. Our proposal supports a variety of model architectures and includes several training methods, such as a dynamic programming algorithm that marginalizes out the latent ordering variable. Our model improves considerably over a simple baseline based on unidirectional transformers on the SIGMORPHON 2023 inflection task and sets SOTA on the 2022 shared task. The model performs particularly well on long sequences, can learn the split point of words composed of stem and affix (without supervision), and performs better relative to the baseline on datasets that have fewer unique lemmas (but more examples per lemma).

Viaarxiv icon

Multimedia Generative Script Learning for Task Planning

Aug 25, 2022
Qingyun Wang, Manling Li, Hou Pong Chan, Lifu Huang, Julia Hockenmaier, Girish Chowdhary, Heng Ji

Figure 1 for Multimedia Generative Script Learning for Task Planning
Figure 2 for Multimedia Generative Script Learning for Task Planning
Figure 3 for Multimedia Generative Script Learning for Task Planning
Figure 4 for Multimedia Generative Script Learning for Task Planning

Goal-oriented generative script learning aims to generate subsequent steps based on a goal, which is an essential task to assist robots in performing stereotypical activities of daily life. We show that the performance of this task can be improved if historical states are not just captured by the linguistic instructions given to people, but are augmented with the additional information provided by accompanying images. Therefore, we propose a new task, Multimedia Generative Script Learning, to generate subsequent steps by tracking historical states in both text and vision modalities, as well as presenting the first benchmark containing 2,338 tasks and 31,496 steps with descriptive images. We aim to generate scripts that are visual-state trackable, inductive for unseen tasks, and diverse in their individual steps. We propose to encode visual state changes through a multimedia selective encoder, transferring knowledge from previously observed tasks using a retrieval-augmented decoder, and presenting the distinct information at each step by optimizing a diversity-oriented contrastive learning objective. We define metrics to evaluate both generation quality and inductive quality. Experiment results demonstrate that our approach significantly outperforms strong baselines.

Viaarxiv icon

Human-guided Collaborative Problem Solving: A Natural Language based Framework

Jul 19, 2022
Harsha Kokel, Mayukh Das, Rakibul Islam, Julia Bonn, Jon Cai, Soham Dan, Anjali Narayan-Chen, Prashant Jayannavar, Janardhan Rao Doppa, Julia Hockenmaier, Sriraam Natarajan, Martha Palmer, Dan Roth

Figure 1 for Human-guided Collaborative Problem Solving: A Natural Language based Framework

We consider the problem of human-machine collaborative problem solving as a planning task coupled with natural language communication. Our framework consists of three components -- a natural language engine that parses the language utterances to a formal representation and vice-versa, a concept learner that induces generalized concepts for plans based on limited interactions with the user, and an HTN planner that solves the task based on human interaction. We illustrate the ability of this framework to address the key challenges of collaborative problem solving by demonstrating it on a collaborative building task in a Minecraft-based blocksworld domain. The accompanied demo video is available at https://youtu.be/q1pWe4aahF0.

* ICAPS 2021 (demo track) 
Viaarxiv icon

HySPA: Hybrid Span Generation for Scalable Text-to-Graph Extraction

Jun 30, 2021
Liliang Ren, Chenkai Sun, Heng Ji, Julia Hockenmaier

Figure 1 for HySPA: Hybrid Span Generation for Scalable Text-to-Graph Extraction
Figure 2 for HySPA: Hybrid Span Generation for Scalable Text-to-Graph Extraction
Figure 3 for HySPA: Hybrid Span Generation for Scalable Text-to-Graph Extraction
Figure 4 for HySPA: Hybrid Span Generation for Scalable Text-to-Graph Extraction

Text-to-Graph extraction aims to automatically extract information graphs consisting of mentions and types from natural language texts. Existing approaches, such as table filling and pairwise scoring, have shown impressive performance on various information extraction tasks, but they are difficult to scale to datasets with longer input texts because of their second-order space/time complexities with respect to the input length. In this work, we propose a Hybrid Span Generator (HySPA) that invertibly maps the information graph to an alternating sequence of nodes and edge types, and directly generates such sequences via a hybrid span decoder which can decode both the spans and the types recurrently in linear time and space complexities. Extensive experiments on the ACE05 dataset show that our approach also significantly outperforms state-of-the-art on the joint entity and relation extraction task.

* Accepted by ACL 2021 Findings 
Viaarxiv icon

A Multi-Perspective Architecture for Semantic Code Search

May 06, 2020
Rajarshi Haldar, Lingfei Wu, Jinjun Xiong, Julia Hockenmaier

Figure 1 for A Multi-Perspective Architecture for Semantic Code Search
Figure 2 for A Multi-Perspective Architecture for Semantic Code Search
Figure 3 for A Multi-Perspective Architecture for Semantic Code Search
Figure 4 for A Multi-Perspective Architecture for Semantic Code Search

The ability to match pieces of code to their corresponding natural language descriptions and vice versa is fundamental for natural language search interfaces to software repositories. In this paper, we propose a novel multi-perspective cross-lingual neural framework for code--text matching, inspired in part by a previous model for monolingual text-to-text matching, to capture both global and local similarities. Our experiments on the CoNaLa dataset show that our proposed model yields better performance on this cross-lingual text-to-code matching task than previous approaches that map code and text to a single joint embedding space.

* ACL 2020 
Viaarxiv icon

Phrase Grounding by Soft-Label Chain Conditional Random Field

Sep 01, 2019
Jiacheng Liu, Julia Hockenmaier

Figure 1 for Phrase Grounding by Soft-Label Chain Conditional Random Field
Figure 2 for Phrase Grounding by Soft-Label Chain Conditional Random Field
Figure 3 for Phrase Grounding by Soft-Label Chain Conditional Random Field
Figure 4 for Phrase Grounding by Soft-Label Chain Conditional Random Field

The phrase grounding task aims to ground each entity mention in a given caption of an image to a corresponding region in that image. Although there are clear dependencies between how different mentions of the same caption should be grounded, previous structured prediction methods that aim to capture such dependencies need to resort to approximate inference or non-differentiable losses. In this paper, we formulate phrase grounding as a sequence labeling task where we treat candidate regions as potential labels, and use neural chain Conditional Random Fields (CRFs) to model dependencies among regions for adjacent mentions. In contrast to standard sequence labeling tasks, the phrase grounding task is defined such that there may be multiple correct candidate regions. To address this multiplicity of gold labels, we define so-called Soft-Label Chain CRFs, and present an algorithm that enables convenient end-to-end training. Our method establishes a new state-of-the-art on phrase grounding on the Flickr30k Entities dataset. Analysis shows that our model benefits both from the entity dependencies captured by the CRF and from the soft-label training regime. Our code is available at \url{github.com/liujch1998/SoftLabelCCRF}

* 11 pages, 5 figures, accepted by EMNLP-IJCNLP 2019 
Viaarxiv icon

Natural Language Inference from Multiple Premises

Oct 09, 2017
Alice Lai, Yonatan Bisk, Julia Hockenmaier

Figure 1 for Natural Language Inference from Multiple Premises
Figure 2 for Natural Language Inference from Multiple Premises
Figure 3 for Natural Language Inference from Multiple Premises
Figure 4 for Natural Language Inference from Multiple Premises

We define a novel textual entailment task that requires inference over multiple premise sentences. We present a new dataset for this task that minimizes trivial lexical inferences, emphasizes knowledge of everyday events, and presents a more challenging setting for textual entailment. We evaluate several strong neural baselines and analyze how the multiple premise task differs from standard textual entailment.

* Accepted at IJCNLP 2017 
Viaarxiv icon

Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues

Aug 09, 2017
Bryan A. Plummer, Arun Mallya, Christopher M. Cervantes, Julia Hockenmaier, Svetlana Lazebnik

Figure 1 for Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Figure 2 for Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Figure 3 for Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Figure 4 for Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues

This paper presents a framework for localization or grounding of phrases in images using a large collection of linguistic and visual cues. We model the appearance, size, and position of entity bounding boxes, adjectives that contain attribute information, and spatial relationships between pairs of entities connected by verbs or prepositions. Special attention is given to relationships between people and clothing or body part mentions, as they are useful for distinguishing individuals. We automatically learn weights for combining these cues and at test time, perform joint inference over all phrases in a caption. The resulting system produces state of the art performance on phrase localization on the Flickr30k Entities dataset and visual relationship detection on the Stanford VRD dataset.

* IEEE ICCV 2017 accepted paper 
Viaarxiv icon

Evaluating Induced CCG Parsers on Grounded Semantic Parsing

Jan 31, 2017
Yonatan Bisk, Siva Reddy, John Blitzer, Julia Hockenmaier, Mark Steedman

Figure 1 for Evaluating Induced CCG Parsers on Grounded Semantic Parsing
Figure 2 for Evaluating Induced CCG Parsers on Grounded Semantic Parsing
Figure 3 for Evaluating Induced CCG Parsers on Grounded Semantic Parsing
Figure 4 for Evaluating Induced CCG Parsers on Grounded Semantic Parsing

We compare the effectiveness of four different syntactic CCG parsers for a semantic slot-filling task to explore how much syntactic supervision is required for downstream semantic analysis. This extrinsic, task-based evaluation provides a unique window to explore the strengths and weaknesses of semantics captured by unsupervised grammar induction systems. We release a new Freebase semantic parsing dataset called SPADES (Semantic PArsing of DEclarative Sentences) containing 93K cloze-style questions paired with answers. We evaluate all our models on this dataset. Our code and data are available at https://github.com/sivareddyg/graph-parser.

* EMNLP 2016, Table 2 erratum, Code and Freebase Semantic Parsing data URL 
Viaarxiv icon

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

Sep 19, 2016
Bryan A. Plummer, Liwei Wang, Chris M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, Svetlana Lazebnik

Figure 1 for Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Figure 2 for Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Figure 3 for Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Figure 4 for Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

The Flickr30k dataset has become a standard benchmark for sentence-based image description. This paper presents Flickr30k Entities, which augments the 158k captions from Flickr30k with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated bounding boxes. Such annotations are essential for continued progress in automatic image description and grounded language understanding. They enable us to define a new benchmark for localization of textual entity mentions in an image. We present a strong baseline for this task that combines an image-text embedding, detectors for common objects, a color classifier, and a bias towards selecting larger objects. While our baseline rivals in accuracy more complex state-of-the-art models, we show that its gains cannot be easily parlayed into improvements on such tasks as image-sentence retrieval, thus underlining the limitations of current methods and the need for further research.

Viaarxiv icon