Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Information Extraction": models, code, and papers

Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation

Oct 25, 2023
Yongxin Shi, Dezhi Peng, Wenhui Liao, Zening Lin, Xinhong Chen, Chongyu Liu, Yuyi Zhang, Lianwen Jin

This paper presents a comprehensive evaluation of the Optical Character Recognition (OCR) capabilities of the recently released GPT-4V(ision), a Large Multimodal Model (LMM). We assess the model's performance across a range of OCR tasks, including scene text recognition, handwritten text recognition, handwritten mathematical expression recognition, table structure recognition, and information extraction from visually-rich document. The evaluation reveals that GPT-4V performs well in recognizing and understanding Latin contents, but struggles with multilingual scenarios and complex tasks. Based on these observations, we delve deeper into the necessity of specialized OCR models and deliberate on the strategies to fully harness the pretrained general LMMs like GPT-4V for OCR downstream tasks. The study offers a critical reference for future research in OCR with LMMs. Evaluation pipeline and results are available at https://github.com/SCUT-DLVCLab/GPT-4V_OCR.

Via

Access Paper or Ask Questions

Easy-to-Hard Learning for Information Extraction

May 19, 2023
Chang Gao, Wenxuan Zhang, Wai Lam, Lidong Bing

Figure 1 for Easy-to-Hard Learning for Information Extraction

Figure 2 for Easy-to-Hard Learning for Information Extraction

Figure 3 for Easy-to-Hard Learning for Information Extraction

Figure 4 for Easy-to-Hard Learning for Information Extraction

Information extraction (IE) systems aim to automatically extract structured information, such as named entities, relations between entities, and events, from unstructured texts. While most existing work addresses a particular IE task, universally modeling various IE tasks with one model has achieved great success recently. Despite their success, they employ a one-stage learning strategy, i.e., directly learning to extract the target structure given the input text, which contradicts the human learning process. In this paper, we propose a unified easy-to-hard learning framework consisting of three stages, i.e., the easy stage, the hard stage, and the main stage, for IE by mimicking the human learning process. By breaking down the learning process into multiple stages, our framework facilitates the model to acquire general IE task knowledge and improve its generalization ability. Extensive experiments across four IE tasks demonstrate the effectiveness of our framework. We achieve new state-of-the-art results on 13 out of 17 datasets. Our code is available at \url{https://github.com/DAMO-NLP-SG/IE-E2H}.

* Findings of ACL 2023

Via

Access Paper or Ask Questions

Weakly supervised information extraction from inscrutable handwritten document images

Jun 12, 2023
Sujoy Paul, Gagan Madan, Akankshya Mishra, Narayan Hegde, Pradeep Kumar, Gaurav Aggarwal

Figure 1 for Weakly supervised information extraction from inscrutable handwritten document images

Figure 2 for Weakly supervised information extraction from inscrutable handwritten document images

Figure 3 for Weakly supervised information extraction from inscrutable handwritten document images

Figure 4 for Weakly supervised information extraction from inscrutable handwritten document images

State-of-the-art information extraction methods are limited by OCR errors. They work well for printed text in form-like documents, but unstructured, handwritten documents still remain a challenge. Adapting existing models to domain-specific training data is quite expensive, because of two factors, 1) limited availability of the domain-specific documents (such as handwritten prescriptions, lab notes, etc.), and 2) annotations become even more challenging as one needs domain-specific knowledge to decode inscrutable handwritten document images. In this work, we focus on the complex problem of extracting medicine names from handwritten prescriptions using only weakly labeled data. The data consists of images along with the list of medicine names in it, but not their location in the image. We solve the problem by first identifying the regions of interest, i.e., medicine lines from just weak labels and then injecting a domain-specific medicine language model learned using only synthetically generated data. Compared to off-the-shelf state-of-the-art methods, our approach performs >2.5x better in medicine names extraction from prescriptions.

* Accepted at ICDAR 2023

Via

Access Paper or Ask Questions

Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer

May 23, 2023
David Dukić, Kiril Gashteovski, Goran Glavaš, Jan Šnajder

Figure 1 for Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer

Figure 2 for Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer

Figure 3 for Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer

Figure 4 for Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer

Event detection is a crucial information extraction task in many domains, such as Wikipedia or news. The task typically relies on trigger detection (TD) -- identifying token spans in the text that evoke specific events. While the notion of triggers should ideally be universal across domains, domain transfer for TD from high- to low-resource domains results in significant performance drops. We address the problem of negative transfer for TD by coupling triggers between domains using subject-object relations obtained from a rule-based open information extraction (OIE) system. We demonstrate that relations injected through multi-task training can act as mediators between triggers in different domains, enhancing zero- and few-shot TD domain transfer and reducing negative transfer, in particular when transferring from a high-resource source Wikipedia domain to a low-resource target news domain. Additionally, we combine the extracted relations with masked language modeling on the target domain and obtain further TD performance gains. Finally, we demonstrate that the results are robust to the choice of the OIE system.

Via

Access Paper or Ask Questions

Probing Representations for Document-level Event Extraction

Oct 23, 2023
Barry Wang, Xinya Du, Claire Cardie

The probing classifiers framework has been employed for interpreting deep neural network models for a variety of natural language processing (NLP) applications. Studies, however, have largely focused on sentencelevel NLP tasks. This work is the first to apply the probing paradigm to representations learned for document-level information extraction (IE). We designed eight embedding probes to analyze surface, semantic, and event-understanding capabilities relevant to document-level event extraction. We apply them to the representations acquired by learning models from three different LLM-based document-level IE approaches on a standard dataset. We found that trained encoders from these models yield embeddings that can modestly improve argument detections and labeling but only slightly enhance event-level tasks, albeit trade-offs in information helpful for coherence and event-type prediction. We further found that encoder models struggle with document length and cross-sentence discourse.

* To appear in EMNLP 2023 Findings

Via

Access Paper or Ask Questions

A Review of Reinforcement Learning for Natural Language Processing, and Applications in Healthcare

Oct 23, 2023
Ying Liu, Haozhu Wang, Huixue Zhou, Mingchen Li, Yu Hou, Sicheng Zhou, Fang Wang, Rama Hoetzlein, Rui Zhang

Reinforcement learning (RL) has emerged as a powerful approach for tackling complex medical decision-making problems such as treatment planning, personalized medicine, and optimizing the scheduling of surgeries and appointments. It has gained significant attention in the field of Natural Language Processing (NLP) due to its ability to learn optimal strategies for tasks such as dialogue systems, machine translation, and question-answering. This paper presents a review of the RL techniques in NLP, highlighting key advancements, challenges, and applications in healthcare. The review begins by visualizing a roadmap of machine learning and its applications in healthcare. And then it explores the integration of RL with NLP tasks. We examined dialogue systems where RL enables the learning of conversational strategies, RL-based machine translation models, question-answering systems, text summarization, and information extraction. Additionally, ethical considerations and biases in RL-NLP systems are addressed.

Via

Access Paper or Ask Questions

CAW-coref: Conjunction-Aware Word-level Coreference Resolution

Oct 19, 2023
Karel D'Oosterlinck, Semere Kiros Bitew, Brandon Papineau, Christopher Potts, Thomas Demeester, Chris Develder

State-of-the-art coreference resolutions systems depend on multiple LLM calls per document and are thus prohibitively expensive for many use cases (e.g., information extraction with large corpora). The leading word-level coreference system (WL-coref) attains 96.6% of these SOTA systems' performance while being much more efficient. In this work, we identify a routine yet important failure case of WL-coref: dealing with conjoined mentions such as 'Tom and Mary'. We offer a simple yet effective solution that improves the performance on the OntoNotes test set by 0.9% F1, shrinking the gap between efficient word-level coreference resolution and expensive SOTA approaches by 34.6%. Our Conjunction-Aware Word-level coreference model (CAW-coref) and code is available at https://github.com/KarelDO/wl-coref.

* Accepted at CRAC 2023

Via

Access Paper or Ask Questions

Hierarchy Builder: Organizing Textual Spans into a Hierarchy to Facilitate Navigation

Sep 18, 2023
Itay Yair, Hillel Taub-Tabib, Yoav Goldberg

Figure 1 for Hierarchy Builder: Organizing Textual Spans into a Hierarchy to Facilitate Navigation

Figure 2 for Hierarchy Builder: Organizing Textual Spans into a Hierarchy to Facilitate Navigation

Figure 3 for Hierarchy Builder: Organizing Textual Spans into a Hierarchy to Facilitate Navigation

Figure 4 for Hierarchy Builder: Organizing Textual Spans into a Hierarchy to Facilitate Navigation

Information extraction systems often produce hundreds to thousands of strings on a specific topic. We present a method that facilitates better consumption of these strings, in an exploratory setting in which a user wants to both get a broad overview of what's available, and a chance to dive deeper on some aspects. The system works by grouping similar items together and arranging the remaining items into a hierarchical navigable DAG structure. We apply the method to medical information extraction.

* Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Pages 282-290, Toronto, Canada, July 2023
* 9 pages including citations; Presented at the ACL 2023 DEMO track, pages 282-290

Via

Access Paper or Ask Questions

Rebuild City Buildings from Off-Nadir Aerial Images with Offset-Building Model (OBM)

Nov 03, 2023
Kai Li, Yupeng Deng, Yunlong Kong, Diyou Liu, Jingbo Chen, Yu Meng, Junxian Ma

Accurate measurement of the offset from roof-to-footprint in very-high-resolution remote sensing imagery is crucial for urban information extraction tasks. With the help of deep learning, existing methods typically rely on two-stage CNN models to extract regions of interest on building feature maps. At the first stage, a Region Proposal Network (RPN) is applied to extract thousands of ROIs (Region of Interests) which will post-imported into a Region-based Convolutional Neural Networks (RCNN) to extract wanted information. However, because of inflexible RPN, these methods often lack effective user interaction, encounter difficulties in instance correspondence, and struggle to keep up with the advancements in general artificial intelligence. This paper introduces an interactive Transformer model combined with a prompt encoder to precisely extract building segmentation as well as the offset vectors from roofs to footprints. In our model, a powerful module, namely ROAM, was tailored for common problems in predicting roof-to-footprint offsets. We tested our model's feasibility on the publicly available BONAI dataset, achieving a significant reduction in Prompt-Instance-Level offset errors ranging from 14.6% to 16.3%. Additionally, we developed a Distance-NMS algorithm tailored for large-scale building offsets, significantly enhancing the accuracy of predicted building offset angles and lengths in a straightforward and efficient manner. To further validate the model's robustness, we created a new test set using 0.5m remote sensing imagery from Huizhou, China, for inference testing. Our code, training methods, and the updated dataset will be accessable at https://github.com/likaiucas.

* this paper will be rewritten in an entirely different direction in a more detailed and understandable way. Another edition will first send to a conference

Via

Access Paper or Ask Questions