Alert button
Picture for Daniel Varab

Daniel Varab

Alert button

Experimental Standards for Deep Learning Research: A Natural Language Processing Perspective

Apr 13, 2022
Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Christian Hardmeier, Barbara Plank

Figure 1 for Experimental Standards for Deep Learning Research: A Natural Language Processing Perspective
Figure 2 for Experimental Standards for Deep Learning Research: A Natural Language Processing Perspective
Figure 3 for Experimental Standards for Deep Learning Research: A Natural Language Processing Perspective

The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well. Yet, as with other fields employing DL techniques, there has been a lack of common experimental standards compared to more established disciplines. Starting from fundamental scientific principles, we distill ongoing discussions on experimental standards in DL into a single, widely-applicable methodology. Following these best practices is crucial to strengthening experimental evidence, improve reproducibility and enable scientific progress. These standards are further collected in a public repository to help them transparently adapt to future needs.

Viaarxiv icon

The Danish Gigaword Project

May 08, 2020
Leon Strømberg-Derczynski, Rebekah Baglini, Morten H. Christiansen, Manuel R. Ciosici, Jacob Aarup Dalsgaard, Riccardo Fusaroli, Peter Juel Henrichsen, Rasmus Hvingelby, Andreas Kirkedal, Alex Speed Kjeldsen, Claus Ladefoged, Finn Årup Nielsen, Malte Lau Petersen, Jonathan Hvithamar Rystrøm, Daniel Varab

Figure 1 for The Danish Gigaword Project

Danish is a North Germanic/Scandinavian language spoken primarily in Denmark, a country with a tradition of technological and scientific innovation. However, from a technological perspective, the Danish language has received relatively little attention and, as a result, Danish language technology is hard to develop, in part due to a lack of large or broad-coverage Danish corpora. This paper describes the Danish Gigaword project, which aims to construct a freely-available one billion word corpus of Danish text that represents the breadth of the written language.

Viaarxiv icon

UniParse: A universal graph-based parsing toolkit

Jul 11, 2018
Daniel Varab, Natalie Schluter

Figure 1 for UniParse: A universal graph-based parsing toolkit
Figure 2 for UniParse: A universal graph-based parsing toolkit

This paper describes the design and use of the graph-based parsing framework and toolkit UniParse, released as an open-source python software package. UniParse as a framework novelly streamlines research prototyping, development and evaluation of graph-based dependency parsing architectures. UniParse does this by enabling highly efficient, sufficiently independent, easily readable, and easily extensible implementations for all dependency parser components. We distribute the toolkit with ready-made configurations as re-implementations of all current state-of-the-art first-order graph-based parsers, including even more efficient Cython implementations of both encoders and decoders, as well as the required specialised loss functions.

Viaarxiv icon