Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kevin R. Nicholas

Parts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in Extracting Information from Biomedical Text

Apr 02, 2008

Maurice HT Ling, Christophe Lefevre, Kevin R. Nicholas

Figure 1 for Parts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in Extracting Information from Biomedical Text

Figure 2 for Parts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in Extracting Information from Biomedical Text

Figure 3 for Parts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in Extracting Information from Biomedical Text

Figure 4 for Parts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in Extracting Information from Biomedical Text

Abstract:A recent study reported development of Muscorian, a generic text processing tool for extracting protein-protein interactions from text that achieved comparable performance to biomedical-specific text processing tools. This result was unexpected since potential errors from a series of text analysis processes is likely to adversely affect the outcome of the entire process. Most biomedical entity relationship extraction tools have used biomedical-specific parts-of-speech (POS) tagger as errors in POS tagging and are likely to affect subsequent semantic analysis of the text, such as shallow parsing. This study aims to evaluate the parts-of-speech (POS) tagging accuracy and attempts to explore whether a comparable performance is obtained when a generic POS tagger, MontyTagger, was used in place of MedPost, a tagger trained in biomedical text. Our results demonstrated that MontyTagger, Muscorian's POS tagger, has a POS tagging accuracy of 83.1% when tested on biomedical text. Replacing MontyTagger with MedPost did not result in a significant improvement in entity relationship extraction from text; precision of 55.6% from MontyTagger versus 56.8% from MedPost on directional relationships and 86.1% from MontyTagger compared to 81.8% from MedPost on nondirectional relationships. This is unexpected as the potential for poor POS tagging by MontyTagger is likely to affect the outcome of the information extraction. An analysis of POS tagging errors demonstrated that 78.5% of tagging errors are being compensated by shallow parsing. Thus, despite 83.1% tagging accuracy, MontyTagger has a functional tagging accuracy of 94.6%.

* Ling, Maurice HT, Lefevre, Christophe, Nicholas, Kevin R. 2008. Parts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in Extracting Information from Biomedical Text. The Python Papers 3 (1): 65-80

Via

Access Paper or Ask Questions

Reconstruction of Protein-Protein Interaction Pathways by Mining Subject-Verb-Objects Intermediates

Aug 06, 2007

Maurice HT Ling, Christophe Lefevre, Kevin R. Nicholas, Feng Lin

Figure 1 for Reconstruction of Protein-Protein Interaction Pathways by Mining Subject-Verb-Objects Intermediates

Figure 2 for Reconstruction of Protein-Protein Interaction Pathways by Mining Subject-Verb-Objects Intermediates

Figure 3 for Reconstruction of Protein-Protein Interaction Pathways by Mining Subject-Verb-Objects Intermediates

Figure 4 for Reconstruction of Protein-Protein Interaction Pathways by Mining Subject-Verb-Objects Intermediates

Abstract:The exponential increase in publication rate of new articles is limiting access of researchers to relevant literature. This has prompted the use of text mining tools to extract key biological information. Previous studies have reported extensive modification of existing generic text processors to process biological text. However, this requirement for modification had not been examined. In this study, we have constructed Muscorian, using MontyLingua, a generic text processor. It uses a two-layered generalization-specialization paradigm previously proposed where text was generically processed to a suitable intermediate format before domain-specific data extraction techniques are applied at the specialization layer. Evaluation using a corpus and experts indicated 86-90% precision and approximately 30% recall in extracting protein-protein interactions, which was comparable to previous studies using either specialized biological text processing tools or modified existing tools. Our study had also demonstrated the flexibility of the two-layered generalization-specialization paradigm by using the same generalization layer for two specialized information extraction tasks.

* Ling, Maurice HT, Lefevre, Christophe, Nicholas, Kevin R, Lin, Feng. 2007. In J.C. Ragapakse, B. Schmidt, and G. Volkert (Eds.), PRIB 2007. Lecture Notes in Bioinformatics 4774: 286-299. Springer-Verlag.
* 2nd IAPR Workshop on Pattern Recognition in Bioinformatics (PRIB 2007). 14 pages, 4 figures

Via

Access Paper or Ask Questions