Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

A Model of Lexical Attraction and Repulsion

Add code

Jun 16, 1997
Doug Beeferman, Adam Berger, John Lafferty

Share this with someone who'll enjoy it:

This paper introduces new methods based on exponential families for modeling the correlations between words in text and speech. While previous work assumed the effects of word co-occurrence statistics to be constant over a window of several hundred words, we show that their influence is nonstationary on a much smaller time scale. Empirical data drawn from English and Japanese text, as well as conversational speech, reveals that the ``attraction'' between words decays exponentially, while stylistic and syntactic contraints create a ``repulsion'' between words that discourages close co-occurrence. We show that these characteristics are well described by simple mixture models based on two-stage exponential distributions which can be trained using the EM algorithm. The resulting distance distributions can then be incorporated as penalizing features in an exponential language model.

* 8 pages, LaTeX source and postscript figures for ACL/EACL'97 paper 

   Access Paper Source

Share this with someone who'll enjoy it: