Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fernando Sánchez León

Universidad Autónoma de Madrid

Annotating and normalizing biomedical NEs with limited knowledge

Dec 19, 2019

Fernando Sánchez León, Ana González Ledesma

Figure 1 for Annotating and normalizing biomedical NEs with limited knowledge

Abstract:Named entity recognition (NER) is the very first step in the linguistic processing of any new domain. It is currently a common process in BioNLP on English clinical text. However, it is still in its infancy in other major languages, as it is the case for Spanish. Presented under the umbrella of the PharmaCoNER shared task, this paper describes a very simple method for the annotation and normalization of pharmacological, chemical and, ultimately, biomedical named entities in clinical cases. The system developed for the shared task is based on limited knowledge, collected, structured and munged in a way that clearly outperforms scores obtained by similar dictionary-based systems for English in the past. Along with this recovering of the knowledge-based methods for NER in subdomains, the paper also highlights the key contribution of resource-based systems in the validation and consolidation of both the annotation guidelines and the human annotation practices. In this sense, some of the authors discoverings on the overall quality of human annotated datasets question the above-mentioned `official' results obtained by this system, that ranked second (0.91 F1-score) and first (0.916 F1-score), respectively, in the two PharmaCoNER subtasks.

* 8 pages; unpublished contribution to the PharmaCoNER shared task held as part of BioNLP-OST 2019

Via

Access Paper or Ask Questions

GramCheck: A Grammar and Style Checker

Jul 01, 1996

Flora Ramírez Bustamante, Fernando Sánchez León

Figure 1 for GramCheck: A Grammar and Style Checker

Abstract:This paper presents a grammar and style checker demonstrator for Spanish and Greek native writers developed within the project GramCheck. Besides a brief grammar error typology for Spanish, a linguistically motivated approach to detection and diagnosis is presented, based on the generalized use of PROLOG extensions to highly typed unification-based grammars. The demonstrator, currently including full coverage for agreement errors and certain head-argument relation issues, also provides correction by means of an analysis-transfer-synthesis cycle. Finally, future extensions to the current system are discussed.

* 7 pages, LaTeX format, uses colap.sty Published: To appear in Proceedings of COLING-96

Via

Access Paper or Ask Questions

Development of a Spanish Version of the Xerox Tagger

May 19, 1995

Fernando Sánchez León, Amalio F. Nieto Serrano

Figure 1 for Development of a Spanish Version of the Xerox Tagger

Figure 2 for Development of a Spanish Version of the Xerox Tagger

Figure 3 for Development of a Spanish Version of the Xerox Tagger

Abstract:This paper describes work performed withing the CRATER ({\em C}orpus {\em R}esources {\em A}nd {\em T}erminology {\em E}xt{\em R}action, MLAP-93/20) project, funded by the Commission of the European Communities. In particular, it addresses the issue of adapting the Xerox Tagger to Spanish in order to tag the Spanish version of the ITU (International Telecommunications Union) corpus. The model implemented by this tagger is briefly presented along with some modifications performed on it in order to use some parameters not probabilistically estimated. Initial decisions, like the tagset, the lexicon and the training corpus are also discussed. Finally, results are presented and the benefits of the {\em mixed model} justified.

* 13 pages

Via

Access Paper or Ask Questions

A Spanish Tagset for the CRATER Project

Jun 14, 1994

Fernando Sánchez León

Abstract:This working paper describes the Spanish tagset to be used in the context of CRATER, a CEC funded project aiming at the creation of a multilingual (English, French, Spanish) aligned corpus using the International Telecommunications Union corpus. In this respect, each version of the corpus will be (or is currently) tagged. Xerox PARC tagger will be adapted to Spanish in order to perform the tagging of the Spanish version. This tagset has been devised as the ideal one for Spanish, and has been posted to several lists in order to get feedback to it.

* 20 pages, LaTeX format

Via

Access Paper or Ask Questions