Alert button
Picture for Julián Moreno-Schneider

Julián Moreno-Schneider

Alert button

Towards Discourse Parsing-inspired Semantic Storytelling

Apr 25, 2020
Georg Rehm, Karolina Zaczynska, Julián Moreno-Schneider, Malte Ostendorff, Peter Bourgonje, Maria Berger, Jens Rauenbusch, André Schmidt, Mikka Wild

Figure 1 for Towards Discourse Parsing-inspired Semantic Storytelling
Figure 2 for Towards Discourse Parsing-inspired Semantic Storytelling
Figure 3 for Towards Discourse Parsing-inspired Semantic Storytelling
Figure 4 for Towards Discourse Parsing-inspired Semantic Storytelling

Previous work of ours on Semantic Storytelling uses text analytics procedures including Named Entity Recognition and Event Detection. In this paper, we outline our longer-term vision on Semantic Storytelling and describe the current conceptual and technical approach. In the project that drives our research we develop AI-based technologies that are verified by partners from industry. One long-term goal is the development of an approach for Semantic Storytelling that has broad coverage and that is, furthermore, robust. We provide first results on experiments that involve discourse parsing, applied to a concrete use case, "Explore the Neighbourhood!", which is based on a semi-automatically collected data set with documents about noteworthy people in one of Berlin's districts. Though automatically obtaining annotations for coherence relations from plain text is a non-trivial challenge, our preliminary results are promising. We envision our approach to be combined with additional features (NER, coreference resolution, knowledge graphs

* Proceedings of QURATOR 2020: The conference for intelligent content solutions, Berlin, Germany, February 2020 
Viaarxiv icon

Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability

Apr 17, 2020
Georg Rehm, Dimitrios Galanis, Penny Labropoulou, Stelios Piperidis, Martin Welß, Ricardo Usbeck, Joachim Köhler, Miltos Deligiannis, Katerina Gkirtzou, Johannes Fischer, Christian Chiarcos, Nils Feldhus, Julián Moreno-Schneider, Florian Kintzel, Elena Montiel, Víctor Rodríguez Doncel, John P. McCrae, David Laqua, Irina Patricia Theile, Christian Dittmar, Kalina Bontcheva, Ian Roberts, Andrejs Vasiljevs, Andis Lagzdiņš

Figure 1 for Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability
Figure 2 for Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability
Figure 3 for Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability
Figure 4 for Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability

With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider federation of AI/LT platforms. We illustrate the approach using the five emerging AI/LT platforms AI4EU, ELG, Lynx, QURATOR and SPEAKER.

* Proceedings of the 1st International Workshop on Language Technology Platforms (IWLTP 2020). To appear 
Viaarxiv icon

A Workflow Manager for Complex NLP and Content Curation Pipelines

Apr 16, 2020
Julián Moreno-Schneider, Peter Bourgonje, Florian Kintzel, Georg Rehm

Figure 1 for A Workflow Manager for Complex NLP and Content Curation Pipelines
Figure 2 for A Workflow Manager for Complex NLP and Content Curation Pipelines

We present a workflow manager for the flexible creation and customisation of NLP processing pipelines. The workflow manager addresses challenges in interoperability across various different NLP tasks and hardware-based resource usage. Based on the four key principles of generality, flexibility, scalability and efficiency, we present the first version of the workflow manager by providing details on its custom definition language, explaining the communication components and the general system architecture and setup. We currently implement the system, which is grounded and motivated by real-world industry use cases in several innovation and transfer projects.

* Proceedings of the 1st International Workshop on Language Technology Platforms (IWLTP 2020). To appear 
Viaarxiv icon

Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling

Mar 29, 2020
Dmitrii Aksenov, Julián Moreno-Schneider, Peter Bourgonje, Robert Schwarzenberg, Leonhard Hennig, Georg Rehm

Figure 1 for Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
Figure 2 for Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
Figure 3 for Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
Figure 4 for Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling

We explore to what extent knowledge about the pre-trained language model that is used is beneficial for the task of abstractive summarization. To this end, we experiment with conditioning the encoder and decoder of a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. We also explore how locality modelling, i.e., the explicit restriction of calculations to the local context, can affect the summarization ability of the Transformer. This is done by introducing 2-dimensional convolutional self-attention into the first layers of the encoder. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset. We additionally train our model on the SwissText dataset to demonstrate usability on German. Both models outperform the baseline in ROUGE scores on two datasets and show its superiority in a manual qualitative analysis.

* Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear 
Viaarxiv icon

A Dataset of German Legal Documents for Named Entity Recognition

Mar 29, 2020
Elena Leitner, Georg Rehm, Julián Moreno-Schneider

Figure 1 for A Dataset of German Legal Documents for Named Entity Recognition
Figure 2 for A Dataset of German Legal Documents for Named Entity Recognition
Figure 3 for A Dataset of German Legal Documents for Named Entity Recognition
Figure 4 for A Dataset of German Legal Documents for Named Entity Recognition

We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.

* Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear 
Viaarxiv icon

Orchestrating NLP Services for the Legal Domain

Mar 28, 2020
Julián Moreno-Schneider, Georg Rehm, Elena Montiel-Ponsoda, Víctor Rodriguez-Doncel, Artem Revenko, Sotirios Karampatakis, Maria Khvalchik, Christian Sageder, Jorge Gracia, Filippo Maganza

Figure 1 for Orchestrating NLP Services for the Legal Domain
Figure 2 for Orchestrating NLP Services for the Legal Domain
Figure 3 for Orchestrating NLP Services for the Legal Domain
Figure 4 for Orchestrating NLP Services for the Legal Domain

Legal technology is currently receiving a lot of attention from various angles. In this contribution we describe the main technical components of a system that is currently under development in the European innovation project Lynx, which includes partners from industry and research. The key contribution of this paper is a workflow manager that enables the flexible orchestration of workflows based on a portfolio of Natural Language Processing and Content Curation services as well as a Multilingual Legal Knowledge Graph that contains semantic information and meaningful references to legal documents. We also describe different use cases with which we experiment and develop prototypical solutions.

* Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear 
Viaarxiv icon