Get our free extension to see links to code for papers anywhere online!


Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers

Add code

Jul 13, 2000
Jakub Zavrel, Walter Daelemans


Share this with someone who'll enjoy it:


This paper describes a new method, Combi-bootstrap, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. Combi-bootstrap uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that Combi-bootstrap: i) can integrate a wide variety of existing resources, and ii) achieves much higher accuracy (up to 44.7 % error reduction) than both the best single tagger and an ensemble tagger constructed out of the same small training sample.

* Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC 2000), pp. 17--20 
* 4 pages 


   Access Paper Source



Share this with someone who'll enjoy it: