Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yoshihiko Gotoh

Information Extraction from Broadcast News

Mar 30, 2000

Yoshihiko Gotoh, Steve Renals

Figure 1 for Information Extraction from Broadcast News

Figure 2 for Information Extraction from Broadcast News

Figure 3 for Information Extraction from Broadcast News

Figure 4 for Information Extraction from Broadcast News

Abstract:This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular we concentrate on statistical finite state models for identifying proper names and other named entities in broadcast speech. Two models are presented: the first represents name class information as a word attribute; the second represents both word-word and class-class transitions explicitly. A common n-gram based formulation is used for both models. The task of named entity identification is characterized by relatively sparse training data and issues related to smoothing are discussed. Experiments are reported using the DARPA/NIST Hub-4E evaluation for North American Broadcast News.

* 12 pages, 3 figures, Philosophical Transactions of the Royal Society of London, series A: Mathematical, Physical and Engineering Sciences, vol. 358, 2000

Via

Access Paper or Ask Questions

Variable Word Rate N-grams

Mar 29, 2000

Yoshihiko Gotoh, Steve Renals

Abstract:The rate of occurrence of words is not uniform but varies from document to document. Despite this observation, parameters for conventional n-gram language models are usually derived using the assumption of a constant word rate. In this paper we investigate the use of variable word rate assumption, modelled by a Poisson distribution or a continuous mixture of Poissons. We present an approach to estimating the relative frequencies of words or n-grams taking prior information of their occurrences into account. Discounting and smoothing schemes are also considered. Using the Broadcast News task, the approach demonstrates a reduction of perplexity up to 10%.

* 4 pages, 4 figures, ICASSP-2000

Via

Access Paper or Ask Questions