Get our free extension to see links to code for papers anywhere online!


Part-of-Speech Tagging with Minimal Lexicalization

Add code

Dec 27, 2003
Virginia Savova, Leonid Peshkin


Share this with someone who'll enjoy it:


We use a Dynamic Bayesian Network to represent compactly a variety of sublexical and contextual features relevant to Part-of-Speech (PoS) tagging. The outcome is a flexible tagger (LegoTag) with state-of-the-art performance (3.6% error on a benchmark corpus). We explore the effect of eliminating redundancy and radically reducing the size of feature vocabularies. We find that a small but linguistically motivated set of suffixes results in improved cross-corpora generalization. We also show that a minimal lexicon limited to function words is sufficient to ensure reasonable performance.

* 10 pages text; 1 figure. To appear in "Current Issues in Linguistic Theory: Recent Advances in Natural Language Processing";John Benjamins Publishers, Amsterdam 


   Access Paper Source



Share this with someone who'll enjoy it: