Get our free extension to see links to code for papers anywhere online!


A Maximum Entropy Approach to Identifying Sentence Boundaries

Add code

Apr 09, 1997
Jeffrey C. Reynar, Adwait Ratnaparkhi


Share this with someone who'll enjoy it:


We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of ., ?, and ! as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules, lexica, part-of-speech tags, or domain-specific information. The model can therefore be trained easily on any genre of English, and should be trainable on any other Roman-alphabet language. Performance is comparable to or better than the performance of similar systems, but we emphasize the simplicity of retraining for new domains.

* Proceedings of the 5th ANLP Conference, 1997 
* 4 pages, uses aclap.sty and covingtn.sty 


   Access Paper Source



Share this with someone who'll enjoy it: