Alert button

AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification

Sep 18, 2023
Abdelrahman Abdallah, Mahmoud Abdalla, Mohamed Elkasaby, Yasser Elbendary, Adam Jatowt

Figure 1 for AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification
Figure 2 for AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification
Figure 3 for AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification
Figure 4 for AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification

Share this with someone who'll enjoy it:

Key information extraction involves recognizing and extracting text from scanned receipts, enabling retrieval of essential content, and organizing it into structured documents. This paper presents a novel multilingual dataset for receipt extraction, addressing key challenges in information extraction and item classification. The dataset comprises $47,720$ samples, including annotations for item names, attributes like (price, brand, etc.), and classification into $44$ product categories. We introduce the InstructLLaMA approach, achieving an F1 score of $0.76$ and an accuracy of $0.68$ for key information extraction and item classification. We provide code, datasets, and checkpoints.\footnote{\url{https://github.com/Update-For-Integrated-Business-AI/AMuRD}}.

View paper onarxiv icon

Share this with someone who'll enjoy it: