Alert button
Picture for Vlad Morariu

Vlad Morariu

Alert button

End-to-end Document Recognition and Understanding with Dessurt

Mar 30, 2022
Brian Davis, Bryan Morse, Bryan Price, Chris Tensmeyer, Curtis Wigington, Vlad Morariu

Figure 1 for End-to-end Document Recognition and Understanding with Dessurt
Figure 2 for End-to-end Document Recognition and Understanding with Dessurt
Figure 3 for End-to-end Document Recognition and Understanding with Dessurt
Figure 4 for End-to-end Document Recognition and Understanding with Dessurt

We introduce Dessurt, a relatively simple document understanding transformer capable of being fine-tuned on a greater variety of document tasks than prior methods. It receives a document image and task string as input and generates arbitrary text autoregressively as output. Because Dessurt is an end-to-end architecture that performs text recognition in addition to the document understanding, it does not require an external recognition model as prior methods do, making it easier to fine-tune to new visual domains. We show that this model is effective at 9 different dataset-task combinations.

Viaarxiv icon

IGA : An Intent-Guided Authoring Assistant

Apr 14, 2021
Simeng Sun, Wenlong Zhao, Varun Manjunatha, Rajiv Jain, Vlad Morariu, Franck Dernoncourt, Balaji Vasan Srinivasan, Mohit Iyyer

Figure 1 for IGA : An Intent-Guided Authoring Assistant
Figure 2 for IGA : An Intent-Guided Authoring Assistant
Figure 3 for IGA : An Intent-Guided Authoring Assistant
Figure 4 for IGA : An Intent-Guided Authoring Assistant

While large-scale pretrained language models have significantly improved writing assistance functionalities such as autocomplete, more complex and controllable writing assistants have yet to be explored. We leverage advances in language modeling to build an interactive writing assistant that generates and rephrases text according to fine-grained author specifications. Users provide input to our Intent-Guided Assistant (IGA) in the form of text interspersed with tags that correspond to specific rhetorical directives (e.g., adding description or contrast, or rephrasing a particular sentence). We fine-tune a language model on a dataset heuristically-labeled with author intent, which allows IGA to fill in these tags with generated text that users can subsequently edit to their liking. A series of automatic and crowdsourced evaluations confirm the quality of IGA's generated outputs, while a small-scale user study demonstrates author preference for IGA over baseline methods in a creative writing task. We release our dataset, code, and demo to spur further research into AI-assisted writing.

* 13 pages 
Viaarxiv icon