Computer Laboratory, University of Cambridge
Abstract:This position paper suggests that progress with automatic summarising demands a better research methodology and a carefully focussed research strategy. In order to develop effective procedures it is necessary to identify and respond to the context factors, i.e. input, purpose, and output factors, that bear on summarising and its evaluation. The paper analyses and illustrates these factors and their implications for evaluation. It then argues that this analysis, together with the state of the art and the intrinsic difficulty of summarising, imply a nearer-term strategy concentrating on shallow, but not surface, text analysis and on indicative summarising. This is illustrated with current work, from which a potentially productive research programme can be developed.
Abstract:Information technology should have much to offer linguistics, not only through the opportunities offered by large-scale data analysis and the stimulus to develop formal computational models, but through the chance to use language in systems for automatic natural language processing. The paper discusses these possibilities in detail, and then examines the actual work that has been done. It is evident that this has so far been primarily research within a new field, computational linguistics, which is largely motivated by the demands, and interest, of practical processing systems, and that information technology has had rather little influence on linguistics at large. There are different reasons for this, and not all good ones: information technology deserves more attention from linguists.
Abstract:Given the present state of work in natural language processing, this address argues first, that advance in both science and applications requires a revival of concern about what language is about, broadly speaking the world; and second, that an attack on the summarising task, which is made ever more important by the growth of electronic text resources and requires an understanding of the role of large-scale discourse structure in marking important text content, is a good way forward.