Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ion Androutsopoulos

University of Edinburgh

Ellogon: A New Text Engineering Platform

May 13, 2002

Georgios Petasis, Vangelis Karkaletsis, Georgios Paliouras, Ion Androutsopoulos, Constantine D. Spyropoulos

Figure 1 for Ellogon: A New Text Engineering Platform

Figure 2 for Ellogon: A New Text Engineering Platform

Figure 3 for Ellogon: A New Text Engineering Platform

Figure 4 for Ellogon: A New Text Engineering Platform

Abstract:This paper presents Ellogon, a multi-lingual, cross-platform, general-purpose text engineering environment. Ellogon was designed in order to aid both researchers in natural language processing, as well as companies that produce language engineering systems for the end-user. Ellogon provides a powerful TIPSTER-based infrastructure for managing, storing and exchanging textual data, embedding and managing text processing components as well as visualising textual data and their associated linguistic information. Among its key features are full Unicode support, an extensive multi-lingual graphical user interface, its modular architecture and the reduced hardware requirements.

* 7 pages, 9 figures. Will be presented to the Third International Conference on Language Resources and Evaluation - LREC 2002

Via

Access Paper or Ask Questions

Generating Multilingual Personalized Descriptions of Museum Exhibits - The M-PIRO Project

Oct 29, 2001

Ion Androutsopoulos, Vassiliki Kokkinaki, Aggeliki Dimitromanolaki, Jo Calder, Jon Oberlander, Elena Not

Figure 1 for Generating Multilingual Personalized Descriptions of Museum Exhibits - The M-PIRO Project

Figure 2 for Generating Multilingual Personalized Descriptions of Museum Exhibits - The M-PIRO Project

Figure 3 for Generating Multilingual Personalized Descriptions of Museum Exhibits - The M-PIRO Project

Figure 4 for Generating Multilingual Personalized Descriptions of Museum Exhibits - The M-PIRO Project

Abstract:This paper provides an overall presentation of the M-PIRO project. M-PIRO is developing technology that will allow museums to generate automatically textual or spoken descriptions of exhibits for collections available over the Web or in virtual reality environments. The descriptions are generated in several languages from information in a language-independent database and small fragments of text, and they can be tailored according to the backgrounds of the users, their ages, and their previous interaction with the system. An authoring tool allows museum curators to update the system's database and to control the language and content of the resulting descriptions. Although the project is still in progress, a Web-based demonstrator that supports English, Greek and Italian is already available, and it is used throughout the paper to highlight the capabilities of the emerging technology.

* 15 pages. Presented at the 29th Conference on Computer Applications and Quantitative Methods in Archaeology, Gotland, Sweden, 2001. A version of the paper with higher quality images can be downloaded from: http://www.iit.demokritos.gr/~ionandr/caa_paper.pdf

Via

Access Paper or Ask Questions

Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

Sep 18, 2000

Ion Androutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Georgios Sakkis, Constantine D. Spyropoulos, Panagiotis Stamatopoulos

Figure 1 for Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

Figure 2 for Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

Figure 3 for Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

Figure 4 for Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

Abstract:We investigate the performance of two machine learning algorithms in the context of anti-spam filtering. The increasing volume of unsolicited bulk e-mail (spam) has generated a need for reliable anti-spam filters. Filters of this type have so far been based mostly on keyword patterns that are constructed by hand and perform poorly. The Naive Bayesian classifier has recently been suggested as an effective method to construct automatically anti-spam filters with superior performance. We investigate thoroughly the performance of the Naive Bayesian filter on a publicly available corpus, contributing towards standard benchmarks. At the same time, we compare the performance of the Naive Bayesian filter to an alternative memory-based learning approach, after introducing suitable cost-sensitive evaluation measures. Both methods achieve very accurate spam filtering, outperforming clearly the keyword-based filter of a widely used e-mail reader.

* Proceedings of the workshop "Machine Learning and Textual Information Access", 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000), H. Zaragoza, P. Gallinari and M. Rajman (Eds.), Lyon, France, September 2000, pp. 1-13

Via

Access Paper or Ask Questions

Selectional Restrictions in HPSG

Aug 23, 2000

Ion Androutsopoulos, Robert Dale

Figure 1 for Selectional Restrictions in HPSG

Figure 2 for Selectional Restrictions in HPSG

Abstract:Selectional restrictions are semantic sortal constraints imposed on the participants of linguistic constructions to capture contextually-dependent constraints on interpretation. Despite their limitations, selectional restrictions have proven very useful in natural language applications, where they have been used frequently in word sense disambiguation, syntactic disambiguation, and anaphora resolution. Given their practical value, we explore two methods to incorporate selectional restrictions in the HPSG theory, assuming that the reader is familiar with HPSG. The first method employs HPSG's Background feature and a constraint-satisfaction component pipe-lined after the parser. The second method uses subsorts of referential indices, and blocks readings that violate selectional restrictions during parsing. While theoretically less satisfactory, we have found the second method particularly useful in the development of practical systems.

* Proceedings of the 18th International Conference on Computational Linguistics (COLING), Saarbrucken, Germany, 31 July - 4 August 2000, pages 15-20

Via

Access Paper or Ask Questions

An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages

Aug 22, 2000

Ion Androutsopoulos, John Koutsias, Konstantinos V. Chandrinos, Constantine D. Spyropoulos

Figure 1 for An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages

Figure 2 for An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages

Figure 3 for An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages

Figure 4 for An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages

Abstract:The growing problem of unsolicited bulk e-mail, also known as "spam", has generated a need for reliable anti-spam e-mail filters. Filters of this type have so far been based mostly on manually constructed keyword patterns. An alternative approach has recently been proposed, whereby a Naive Bayesian classifier is trained automatically to detect spam messages. We test this approach on a large collection of personal e-mail messages, which we make publicly available in "encrypted" form contributing towards standard benchmarks. We introduce appropriate cost-sensitive measures, investigating at the same time the effect of attribute-set size, training-corpus size, lemmatization, and stop lists, issues that have not been explored in previous experiments. Finally, the Naive Bayesian filter is compared, in terms of performance, to a filter that uses keyword patterns, and which is part of a widely used e-mail reader.

* Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, N.J. Belkin, P. Ingwersen and M.-K. Leong (Eds.), Athens, Greece, July 24-28, 2000, pages 160-167

Via

Access Paper or Ask Questions

An evaluation of Naive Bayesian anti-spam filtering

Jun 07, 2000

Ion Androutsopoulos, John Koutsias, Konstantinos V. Chandrinos, George Paliouras, Constantine D. Spyropoulos

Figure 1 for An evaluation of Naive Bayesian anti-spam filtering

Figure 2 for An evaluation of Naive Bayesian anti-spam filtering

Figure 3 for An evaluation of Naive Bayesian anti-spam filtering

Figure 4 for An evaluation of Naive Bayesian anti-spam filtering

Abstract:It has recently been argued that a Naive Bayesian classifier can be used to filter unsolicited bulk e-mail ("spam"). We conduct a thorough evaluation of this proposal on a corpus that we make publicly available, contributing towards standard benchmarks. At the same time we investigate the effect of attribute-set size, training-corpus size, lemmatization, and stop-lists on the filter's performance, issues that had not been previously explored. After introducing appropriate cost-sensitive evaluation measures, we reach the conclusion that additional safety nets are needed for the Naive Bayesian anti-spam filter to be viable in practice.

* Proceedings of the workshop on Machine Learning in the New Information Age, G. Potamias, V. Moustakis and M. van Someren (eds.), 11th European Conference on Machine Learning, Barcelona, Spain, pp. 9-17, 2000
* 9 pages

Via

Access Paper or Ask Questions

A Principled Framework for Constructing Natural Language Interfaces To Temporal Databases

Sep 23, 1996

Ion Androutsopoulos

Figure 1 for A Principled Framework for Constructing Natural Language Interfaces To Temporal Databases

Figure 2 for A Principled Framework for Constructing Natural Language Interfaces To Temporal Databases

Figure 3 for A Principled Framework for Constructing Natural Language Interfaces To Temporal Databases

Figure 4 for A Principled Framework for Constructing Natural Language Interfaces To Temporal Databases

Abstract:Most existing natural language interfaces to databases (NLIDBs) were designed to be used with ``snapshot'' database systems, that provide very limited facilities for manipulating time-dependent data. Consequently, most NLIDBs also provide very limited support for the notion of time. The database community is becoming increasingly interested in _temporal_ database systems. These are intended to store and manipulate in a principled manner information not only about the present, but also about the past and future. This thesis develops a principled framework for constructing English NLIDBs for _temporal_ databases (NLITDBs), drawing on research in tense and aspect theories, temporal logics, and temporal databases. I first explore temporal linguistic phenomena that are likely to appear in English questions to NLITDBs. Drawing on existing linguistic theories of time, I formulate an account for a large number of these phenomena that is simple enough to be embodied in practical NLITDBs. Exploiting ideas from temporal logics, I then define a temporal meaning representation language, TOP, and I show how the HPSG grammar theory can be modified to incorporate the tense and aspect account of this thesis, and to map a wide range of English questions involving time to appropriate TOP expressions. Finally, I present and prove the correctness of a method to translate from TOP to TSQL2, TSQL2 being a temporal extension of the SQL-92 database language. This way, I establish a sound route from English questions involving time to a general-purpose temporal database language, that can act as a principled framework for building NLITDBs. To demonstrate that this framework is workable, I employ it to develop a prototype NLITDB, implemented using ALE and Prolog.

* PhD thesis; 405 pages; LaTeX2e, uses the packages/macros: amstex, xspace, avm, examples, dvips, varioref, makeidx, epic, eepic, ecltree; postscript figures included

Via

Access Paper or Ask Questions