Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ion Androutsopoulos

Probabilistic Cascading for Large Scale Hierarchical Classification

May 09, 2015
Aris Kosmopoulos, Georgios Paliouras, Ion Androutsopoulos

Figure 1 for Probabilistic Cascading for Large Scale Hierarchical Classification

Figure 2 for Probabilistic Cascading for Large Scale Hierarchical Classification

Figure 3 for Probabilistic Cascading for Large Scale Hierarchical Classification

Hierarchies are frequently used for the organization of objects. Given a hierarchy of classes, two main approaches are used, to automatically classify new instances: flat classification and cascade classification. Flat classification ignores the hierarchy, while cascade classification greedily traverses the hierarchy from the root to the predicted leaf. In this paper we propose a new approach, which extends cascade classification to predict the right leaf by estimating the probability of each root-to-leaf path. We provide experimental results which indicate that, using the same classification algorithm, one can achieve better results with our approach, compared to the traditional flat and cascade classifications.

Via

Access Paper or Ask Questions

LSHTC: A Benchmark for Large-Scale Text Classification

Mar 30, 2015
Ioannis Partalas, Aris Kosmopoulos, Nicolas Baskiotis, Thierry Artieres, George Paliouras, Eric Gaussier, Ion Androutsopoulos, Massih-Reza Amini, Patrick Galinari

Figure 1 for LSHTC: A Benchmark for Large-Scale Text Classification

Figure 2 for LSHTC: A Benchmark for Large-Scale Text Classification

Figure 3 for LSHTC: A Benchmark for Large-Scale Text Classification

Figure 4 for LSHTC: A Benchmark for Large-Scale Text Classification

LSHTC is a series of challenges which aims to assess the performance of classification systems in large-scale classification in a a large number of classes (up to hundreds of thousands). This paper describes the dataset that have been released along the LSHTC series. The paper details the construction of the datsets and the design of the tracks as well as the evaluation measures that we implemented and a quick overview of the results. All of these datasets are available online and runs may still be submitted on the online server of the challenges.

Via

Access Paper or Ask Questions

Generating Natural Language Descriptions from OWL Ontologies: the NaturalOWL System

Apr 24, 2014
Ion Androutsopoulos, Gerasimos Lampouras, Dimitrios Galanis

Figure 1 for Generating Natural Language Descriptions from OWL Ontologies: the NaturalOWL System

Figure 2 for Generating Natural Language Descriptions from OWL Ontologies: the NaturalOWL System

Figure 3 for Generating Natural Language Descriptions from OWL Ontologies: the NaturalOWL System

Figure 4 for Generating Natural Language Descriptions from OWL Ontologies: the NaturalOWL System

We present NaturalOWL, a natural language generation system that produces texts describing individuals or classes of OWL ontologies. Unlike simpler OWL verbalizers, which typically express a single axiom at a time in controlled, often not entirely fluent natural language primarily for the benefit of domain experts, we aim to generate fluent and coherent multi-sentence texts for end-users. With a system like NaturalOWL, one can publish information in OWL on the Web, along with automatically produced corresponding texts in multiple languages, making the information accessible not only to computer programs and domain experts, but also end-users. We discuss the processing stages of NaturalOWL, the optional domain-dependent linguistic resources that the system can use at each stage, and why they are useful. We also present trials showing that when the domain-dependent llinguistic resources are available, NaturalOWL produces significantly better texts compared to a simpler verbalizer, and that the resources can be created with relatively light effort.

* Journal Of Artificial Intelligence Research, Volume 48, pages 671-715, 2013

Via

Access Paper or Ask Questions

Evaluation Measures for Hierarchical Classification: a unified view and novel approaches

Jul 01, 2013
Aris Kosmopoulos, Ioannis Partalas, Eric Gaussier, Georgios Paliouras, Ion Androutsopoulos

Figure 1 for Evaluation Measures for Hierarchical Classification: a unified view and novel approaches

Figure 2 for Evaluation Measures for Hierarchical Classification: a unified view and novel approaches

Figure 3 for Evaluation Measures for Hierarchical Classification: a unified view and novel approaches

Figure 4 for Evaluation Measures for Hierarchical Classification: a unified view and novel approaches

Hierarchical classification addresses the problem of classifying items into a hierarchy of classes. An important issue in hierarchical classification is the evaluation of different classification algorithms, which is complicated by the hierarchical relations among the classes. Several evaluation measures have been proposed for hierarchical classification using the hierarchy in different ways. This paper studies the problem of evaluation in hierarchical classification by analyzing and abstracting the key components of the existing performance measures. It also proposes two alternative generic views of hierarchical evaluation and introduces two corresponding novel measures. The proposed measures, along with the state-of-the art ones, are empirically tested on three large datasets from the domain of text classification. The empirical results illustrate the undesirable behavior of existing approaches and how the proposed methods overcome most of these methods across a range of cases.

* Submitted to journal

Via

Access Paper or Ask Questions

A Survey of Paraphrasing and Textual Entailment Methods

May 30, 2010
Ion Androutsopoulos, Prodromos Malakasiotis

Figure 1 for A Survey of Paraphrasing and Textual Entailment Methods

Figure 2 for A Survey of Paraphrasing and Textual Entailment Methods

Figure 3 for A Survey of Paraphrasing and Textual Entailment Methods

Figure 4 for A Survey of Paraphrasing and Textual Entailment Methods

Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.

* I. Androutsopoulos and P. Malakasiotis, "A Survey of Paraphrasing and Textual Entailment Methods". Journal of Artificial Intelligence Research, 38:135-187, 2010
* Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 2010

Via

Access Paper or Ask Questions

Learning to Order Facts for Discourse Planning in Natural Language Generation

Jun 13, 2003
Aggeliki Dimitromanolaki, Ion Androutsopoulos

Figure 1 for Learning to Order Facts for Discourse Planning in Natural Language Generation

Figure 2 for Learning to Order Facts for Discourse Planning in Natural Language Generation

Figure 3 for Learning to Order Facts for Discourse Planning in Natural Language Generation

This paper presents a machine learning approach to discourse planning in natural language generation. More specifically, we address the problem of learning the most natural ordering of facts in discourse plans for a specific domain. We discuss our methodology and how it was instantiated using two different machine learning algorithms. A quantitative evaluation performed in the domain of museum exhibit descriptions indicates that our approach performs significantly better than manually constructed ordering rules. Being retrainable, the resulting planners can be ported easily to other similar domains, without requiring language technology expertise.

* Proceedings of EACL 2003 Workshop on Natural Language Generation
* 8 pages, 4 figures, 1 table

Via

Access Paper or Ask Questions

Ellogon: A New Text Engineering Platform

May 13, 2002
Georgios Petasis, Vangelis Karkaletsis, Georgios Paliouras, Ion Androutsopoulos, Constantine D. Spyropoulos

Figure 1 for Ellogon: A New Text Engineering Platform

Figure 2 for Ellogon: A New Text Engineering Platform

Figure 3 for Ellogon: A New Text Engineering Platform

Figure 4 for Ellogon: A New Text Engineering Platform

This paper presents Ellogon, a multi-lingual, cross-platform, general-purpose text engineering environment. Ellogon was designed in order to aid both researchers in natural language processing, as well as companies that produce language engineering systems for the end-user. Ellogon provides a powerful TIPSTER-based infrastructure for managing, storing and exchanging textual data, embedding and managing text processing components as well as visualising textual data and their associated linguistic information. Among its key features are full Unicode support, an extensive multi-lingual graphical user interface, its modular architecture and the reduced hardware requirements.

* 7 pages, 9 figures. Will be presented to the Third International Conference on Language Resources and Evaluation - LREC 2002

Via

Access Paper or Ask Questions

Generating Multilingual Personalized Descriptions of Museum Exhibits - The M-PIRO Project

Oct 29, 2001
Ion Androutsopoulos, Vassiliki Kokkinaki, Aggeliki Dimitromanolaki, Jo Calder, Jon Oberlander, Elena Not

Figure 1 for Generating Multilingual Personalized Descriptions of Museum Exhibits - The M-PIRO Project

Figure 2 for Generating Multilingual Personalized Descriptions of Museum Exhibits - The M-PIRO Project

Figure 3 for Generating Multilingual Personalized Descriptions of Museum Exhibits - The M-PIRO Project

Figure 4 for Generating Multilingual Personalized Descriptions of Museum Exhibits - The M-PIRO Project

This paper provides an overall presentation of the M-PIRO project. M-PIRO is developing technology that will allow museums to generate automatically textual or spoken descriptions of exhibits for collections available over the Web or in virtual reality environments. The descriptions are generated in several languages from information in a language-independent database and small fragments of text, and they can be tailored according to the backgrounds of the users, their ages, and their previous interaction with the system. An authoring tool allows museum curators to update the system's database and to control the language and content of the resulting descriptions. Although the project is still in progress, a Web-based demonstrator that supports English, Greek and Italian is already available, and it is used throughout the paper to highlight the capabilities of the emerging technology.

* 15 pages. Presented at the 29th Conference on Computer Applications and Quantitative Methods in Archaeology, Gotland, Sweden, 2001. A version of the paper with higher quality images can be downloaded from: http://www.iit.demokritos.gr/~ionandr/caa_paper.pdf

Via

Access Paper or Ask Questions

Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

Sep 18, 2000
Ion Androutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Georgios Sakkis, Constantine D. Spyropoulos, Panagiotis Stamatopoulos

Figure 1 for Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

Figure 2 for Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

Figure 3 for Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

Figure 4 for Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

We investigate the performance of two machine learning algorithms in the context of anti-spam filtering. The increasing volume of unsolicited bulk e-mail (spam) has generated a need for reliable anti-spam filters. Filters of this type have so far been based mostly on keyword patterns that are constructed by hand and perform poorly. The Naive Bayesian classifier has recently been suggested as an effective method to construct automatically anti-spam filters with superior performance. We investigate thoroughly the performance of the Naive Bayesian filter on a publicly available corpus, contributing towards standard benchmarks. At the same time, we compare the performance of the Naive Bayesian filter to an alternative memory-based learning approach, after introducing suitable cost-sensitive evaluation measures. Both methods achieve very accurate spam filtering, outperforming clearly the keyword-based filter of a widely used e-mail reader.

* Proceedings of the workshop "Machine Learning and Textual Information Access", 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000), H. Zaragoza, P. Gallinari and M. Rajman (Eds.), Lyon, France, September 2000, pp. 1-13

Via

Access Paper or Ask Questions

Selectional Restrictions in HPSG

Aug 23, 2000
Ion Androutsopoulos, Robert Dale

Figure 1 for Selectional Restrictions in HPSG

Figure 2 for Selectional Restrictions in HPSG

Selectional restrictions are semantic sortal constraints imposed on the participants of linguistic constructions to capture contextually-dependent constraints on interpretation. Despite their limitations, selectional restrictions have proven very useful in natural language applications, where they have been used frequently in word sense disambiguation, syntactic disambiguation, and anaphora resolution. Given their practical value, we explore two methods to incorporate selectional restrictions in the HPSG theory, assuming that the reader is familiar with HPSG. The first method employs HPSG's Background feature and a constraint-satisfaction component pipe-lined after the parser. The second method uses subsorts of referential indices, and blocks readings that violate selectional restrictions during parsing. While theoretically less satisfactory, we have found the second method particularly useful in the development of practical systems.

* Proceedings of the 18th International Conference on Computational Linguistics (COLING), Saarbrucken, Germany, 31 July - 4 August 2000, pages 15-20

Via

Access Paper or Ask Questions