Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Juan-Manuel Torres-Moreno

Systèmes du LIA à DEFT'13

Feb 21, 2017

Xavier Bost, Ilaria Brunetti, Luis Adrián Cabrera-Diego, Jean-Valère Cossu, Andréa Linhares, Mohamed Morchid, Juan-Manuel Torres-Moreno, Marc El-Bèze, Richard Dufour

Abstract:The 2013 D\'efi de Fouille de Textes (DEFT) campaign is interested in two types of language analysis tasks, the document classification and the information extraction in the specialized domain of cuisine recipes. We present the systems that the LIA has used in DEFT 2013. Our systems show interesting results, even though the complexity of the proposed tasks.

* Proceedings of the Ninth DEFT Workshop, DEFT2013, Les Sables-d'Olonne, France, 21st June 2013
* 12 pages, 3 tables, (Paper in French)

Via

Access Paper or Ask Questions

Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

Feb 21, 2017

Carlos-Emiliano González-Gallardo, Juan-Manuel Torres-Moreno, Azucena Montes Rendón, Gerardo Sierra

Figure 1 for Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

Figure 2 for Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

Figure 3 for Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

Figure 4 for Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

Abstract:In this paper we describe a dynamic normalization process applied to social network multilingual documents (Facebook and Twitter) to improve the performance of the Author profiling task for short texts. After the normalization process, $n$-grams of characters and n-grams of POS tags are obtained to extract all the possible stylistic information encoded in the documents (emoticons, character flooding, capital letters, references to other users, hyperlinks, hashtags, etc.). Experiments with SVM showed up to 90% of performance.

* Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Vol 1: KDIR, 307-314, 2016, Porto, Portugal
* 8 pages, 6 figures, Conference paper

Via

Access Paper or Ask Questions

LIA-RAG: a system based on graphs and divergence of probabilities applied to Speech-To-Text Summarization

Jan 26, 2016

Elvys Linhares Pontes, Juan-Manuel Torres-Moreno, Andréa Carneiro Linhares

Figure 1 for LIA-RAG: a system based on graphs and divergence of probabilities applied to Speech-To-Text Summarization

Figure 2 for LIA-RAG: a system based on graphs and divergence of probabilities applied to Speech-To-Text Summarization

Figure 3 for LIA-RAG: a system based on graphs and divergence of probabilities applied to Speech-To-Text Summarization

Figure 4 for LIA-RAG: a system based on graphs and divergence of probabilities applied to Speech-To-Text Summarization

Abstract:This paper aims to introduces a new algorithm for automatic speech-to-text summarization based on statistical divergences of probabilities and graphs. The input is a text from speech conversations with noise, and the output a compact text summary. Our results, on the pilot task CCCS Multiling 2015 French corpus are very encouraging

* 7 pages, 2 figures, CCCS Multiling 2015 Workshop

Via

Access Paper or Ask Questions

Regroupement sémantique de définitions en espagnol

Jan 20, 2015

Gerardo Sierra, Juan-Manuel Torres-Moreno, Alejandro Molina

Figure 1 for Regroupement sémantique de définitions en espagnol

Figure 2 for Regroupement sémantique de définitions en espagnol

Figure 3 for Regroupement sémantique de définitions en espagnol

Abstract:This article focuses on the description and evaluation of a new unsupervised learning method of clustering of definitions in Spanish according to their semantic. Textual Energy was used as a clustering measure, and we study an adaptation of the Precision and Recall to evaluate our method.

* 11 pages, in French, 5 figures. Workshop Evaluation des m\'ethodes d'Extraction de Connaissances dans les Donn\'ees EvalECD EGC'10, 2010 Tunis

Via

Access Paper or Ask Questions

Optimisation using Natural Language Processing: Personalized Tour Recommendation for Museums

Jan 06, 2015

Mayeul Mathias, Assema Moussa, Fen Zhou, Juan-Manuel Torres-Moreno, Marie-Sylvie Poli, Didier Josselin, Marc El-Bèze, Andréa Carneiro Linhares, Francoise Rigat

Figure 1 for Optimisation using Natural Language Processing: Personalized Tour Recommendation for Museums

Figure 2 for Optimisation using Natural Language Processing: Personalized Tour Recommendation for Museums

Figure 3 for Optimisation using Natural Language Processing: Personalized Tour Recommendation for Museums

Figure 4 for Optimisation using Natural Language Processing: Personalized Tour Recommendation for Museums

Abstract:This paper proposes a new method to provide personalized tour recommendation for museum visits. It combines an optimization of preference criteria of visitors with an automatic extraction of artwork importance from museum information based on Natural Language Processing using textual energy. This project includes researchers from computer and social sciences. Some results are obtained with numerical experiments. They show that our model clearly improves the satisfaction of the visitor who follows the proposed tour. This work foreshadows some interesting outcomes and applications about on-demand personalized visit of museums in a very near future.

* 8 pages, 4 figures; Proceedings of the 2014 Federated Conference on Computer Science and Information Systems pp. 439-446

Via

Access Paper or Ask Questions

Un résumeur à base de graphes, indépéndant de la langue

Jan 06, 2015

Juan-Manuel Torres-Moreno, Javier Ramirez, Iria da Cunha

Figure 1 for Un résumeur à base de graphes, indépéndant de la langue

Figure 2 for Un résumeur à base de graphes, indépéndant de la langue

Figure 3 for Un résumeur à base de graphes, indépéndant de la langue

Abstract:In this paper we present REG, a graph-based approach for study a fundamental problem of Natural Language Processing (NLP): the automatic text summarization. The algorithm maps a document as a graph, then it computes the weight of their sentences. We have applied this approach to summarize documents in three languages.

* 8 pages, in French, 2 figures; International Workshop on African Human Language Technologies

Via

Access Paper or Ask Questions

Sentence Compression in Spanish driven by Discourse Segmentation and Language Models

Dec 17, 2012

Alejandro Molina, Juan-Manuel Torres-Moreno, Iria da Cunha, Eric SanJuan, Gerardo Sierra

Figure 1 for Sentence Compression in Spanish driven by Discourse Segmentation and Language Models

Figure 2 for Sentence Compression in Spanish driven by Discourse Segmentation and Language Models

Figure 3 for Sentence Compression in Spanish driven by Discourse Segmentation and Language Models

Abstract:Previous works demonstrated that Automatic Text Summarization (ATS) by sentences extraction may be improved using sentence compression. In this work we present a sentence compressions approach guided by level-sentence discourse segmentation and probabilistic language models (LM). The results presented here show that the proposed solution is able to generate coherent summaries with grammatical compressed sentences. The approach is simple enough to be transposed into other languages.

* 7 pages, 3 tables

Via

Access Paper or Ask Questions

Condensés de textes par des méthodes numériques

Dec 09, 2012

Juan-Manuel Torres-Moreno, Patricia Velázquez-Morales, Jean-Guy Meunier

Figure 1 for Condensés de textes par des méthodes numériques

Figure 2 for Condensés de textes par des méthodes numériques

Figure 3 for Condensés de textes par des méthodes numériques

Figure 4 for Condensés de textes par des méthodes numériques

Abstract:Since information in electronic form is already a standard, and that the variety and the quantity of information become increasingly large, the methods of summarizing or automatic condensation of texts is a critical phase of the analysis of texts. This article describes CORTEX a system based on numerical methods, which allows obtaining a condensation of a text, which is independent of the topic and of the length of the text. The structure of the system enables it to find the abstracts in French or Spanish in very short times.

* Conf\'erence JADT 2002, Saint-Malo/France. 12 pages, 7 figures

Via

Access Paper or Ask Questions

Artex is AnotheR TEXt summarizer

Oct 11, 2012

Juan-Manuel Torres-Moreno

Figure 1 for Artex is AnotheR TEXt summarizer

Figure 2 for Artex is AnotheR TEXt summarizer

Figure 3 for Artex is AnotheR TEXt summarizer

Figure 4 for Artex is AnotheR TEXt summarizer

Abstract:This paper describes Artex, another algorithm for Automatic Text Summarization. In order to rank sentences, a simple inner product is calculated between each sentence, a document vector (text topic) and a lexical vector (vocabulary used by a sentence). Summaries are then generated by assembling the highest ranked sentences. No ruled-based linguistic post-processing is necessary in order to obtain summaries. Tests over several datasets (coming from Document Understanding Conferences (DUC), Text Analysis Conferences (TAC), evaluation campaigns, etc.) in French, English and Spanish have shown that summarizer achieves interesting results.

* 11 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:1209.3126

Via

Access Paper or Ask Questions

Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

Sep 14, 2012

Juan-Manuel Torres-Moreno

Figure 1 for Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

Figure 2 for Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

Figure 3 for Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

Figure 4 for Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

Abstract:In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.

* 22 pages, 12 figures, 9 tables

Via

Access Paper or Ask Questions