Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Martins de Matos

The Influence of Context on Dialogue Act Recognition

Jan 09, 2017

Eugénio Ribeiro, Ricardo Ribeiro, David Martins de Matos

Figure 1 for The Influence of Context on Dialogue Act Recognition

Figure 2 for The Influence of Context on Dialogue Act Recognition

Figure 3 for The Influence of Context on Dialogue Act Recognition

Figure 4 for The Influence of Context on Dialogue Act Recognition

Abstract:This article presents an analysis of the influence of context information on dialog act recognition. We performed experiments on the widely explored Switchboard corpus, as well as on data annotated according to the recent ISO 24617-2 standard. The latter was obtained from the Tilburg DialogBank and through the mapping of the annotations of a subset of the Let's Go corpus. We used a classification approach based on SVMs, which had proved successful in previous work and allowed us to limit the amount of context information provided. This way, we were able to observe the influence patterns as the amount of context information increased. Our base features consisted of n-grams, punctuation, and wh-words. Context information was obtained from one to five preceding segments and provided either as n-grams or dialog act classifications, with the latter typically leading to better results and more stable influence patterns. In addition to the conclusions about the importance and influence of context information, our experiments on the Switchboard corpus also led to results that advanced the state-of-the-art on the dialog act recognition task on that corpus. Furthermore, the results obtained on data annotated according to the ISO 24617-2 standard define a baseline for future work and contribute for the standardization of experiments in the area.

* 30 pages, 8 figures, 19 tables, submitted to Computational Linguistics

Via

Access Paper or Ask Questions

Mapping the Dialog Act Annotations of the LEGO Corpus into the Communicative Functions of ISO 24617-2

Dec 05, 2016

Eugénio Ribeiro, Ricardo Ribeiro, David Martins de Matos

Figure 1 for Mapping the Dialog Act Annotations of the LEGO Corpus into the Communicative Functions of ISO 24617-2

Figure 2 for Mapping the Dialog Act Annotations of the LEGO Corpus into the Communicative Functions of ISO 24617-2

Figure 3 for Mapping the Dialog Act Annotations of the LEGO Corpus into the Communicative Functions of ISO 24617-2

Figure 4 for Mapping the Dialog Act Annotations of the LEGO Corpus into the Communicative Functions of ISO 24617-2

Abstract:In this paper we present strategies for mapping the dialog act annotations of the LEGO corpus into the communicative functions of the ISO 24617-2 standard. Using these strategies, we obtained an additional 347 dialogs annotated according to the standard. This is particularly important given the reduced amount of existing data in those conditions due to the recency of the standard. Furthermore, these are dialogs from a widely explored corpus for dialog related tasks. However, its dialog annotations have been neglected due to their high domain-dependency, which renders them unuseful outside the context of the corpus. Thus, through our mapping process, we both obtain more data annotated according to a recent standard and provide useful dialog act annotations for a widely explored corpus in the context of dialog research.

* 20 pages, 2 figures

Via

Access Paper or Ask Questions

Fast and Extensible Online Multivariate Kernel Density Estimation

Jun 08, 2016

Jaime Ferreira, David Martins de Matos, Ricardo Ribeiro

Figure 1 for Fast and Extensible Online Multivariate Kernel Density Estimation

Figure 2 for Fast and Extensible Online Multivariate Kernel Density Estimation

Figure 3 for Fast and Extensible Online Multivariate Kernel Density Estimation

Figure 4 for Fast and Extensible Online Multivariate Kernel Density Estimation

Abstract:We present xokde++, a state-of-the-art online kernel density estimation approach that maintains Gaussian mixture models input data streams. The approach follows state-of-the-art work on online density estimation, but was redesigned with computational efficiency, numerical robustness, and extensibility in mind. Our approach produces comparable or better results than the current state-of-the-art, while achieving significant computational performance gains and improved numerical stability. The use of diagonal covariance Gaussian kernels, which further improve performance and stability, at a small loss of modelling quality, is also explored. Our approach is up to 40 times faster, while requiring 90\% less memory than the closest state-of-the-art counterpart.

* 17 pages, 1 figure, 7 tables, submission to Pattern Recognition Letters, review version

Via

Access Paper or Ask Questions

Summarization of Films and Documentaries Based on Subtitles and Scripts

Mar 09, 2016

Marta Aparício, Paulo Figueiredo, Francisco Raposo, David Martins de Matos, Ricardo Ribeiro, Luís Marujo

Figure 1 for Summarization of Films and Documentaries Based on Subtitles and Scripts

Figure 2 for Summarization of Films and Documentaries Based on Subtitles and Scripts

Figure 3 for Summarization of Films and Documentaries Based on Subtitles and Scripts

Figure 4 for Summarization of Films and Documentaries Based on Subtitles and Scripts

Abstract:We assess the performance of generic text summarization algorithms applied to films and documentaries, using the well-known behavior of summarization of news articles as reference. We use three datasets: (i) news articles, (ii) film scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics are used for comparing generated summaries against news abstracts, plot summaries, and synopses. We show that the best performing algorithms are LSA, for news articles and documentaries, and LexRank and Support Sets, for films. Despite the different nature of films and documentaries, their relative behavior is in accordance with that obtained for news articles.

* Pattern Recognition Letters, Volume 73, 1 April 2016, Pages 7-12
* 7 pages, 9 tables, 4 figures, submitted to Pattern Recognition Letters (Elsevier)

Via

Access Paper or Ask Questions

Using Generic Summarization to Improve Music Information Retrieval Tasks

Mar 09, 2016

Francisco Raposo, Ricardo Ribeiro, David Martins de Matos

Figure 1 for Using Generic Summarization to Improve Music Information Retrieval Tasks

Figure 2 for Using Generic Summarization to Improve Music Information Retrieval Tasks

Figure 3 for Using Generic Summarization to Improve Music Information Retrieval Tasks

Figure 4 for Using Generic Summarization to Improve Music Information Retrieval Tasks

Abstract:In order to satisfy processing time constraints, many MIR tasks process only a segment of the whole music signal. This practice may lead to decreasing performance, since the most important information for the tasks may not be in those processed segments. In this paper, we leverage generic summarization algorithms, previously applied to text and speech summarization, to summarize items in music datasets. These algorithms build summaries, that are both concise and diverse, by selecting appropriate segments from the input signal which makes them good candidates to summarize music as well. We evaluate the summarization process on binary and multiclass music genre classification tasks, by comparing the performance obtained using summarized datasets against the performances obtained using continuous segments (which is the traditional method used for addressing the previously mentioned time constraints) and full songs of the same original dataset. We show that GRASSHOPPER, LexRank, LSA, MMR, and a Support Sets-based Centrality model improve classification performance when compared to selected 30-second baselines. We also show that summarized datasets lead to a classification performance whose difference is not statistically significant from using full songs. Furthermore, we make an argument stating the advantages of sharing summarized datasets for future MIR research.

* IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 24, n. 6, March 2016
* 24 pages, 10 tables; Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

Via

Access Paper or Ask Questions

Generation of Multimedia Artifacts: An Extractive Summarization-based Approach

Aug 13, 2015

Paulo Figueiredo, Marta Aparício, David Martins de Matos, Ricardo Ribeiro

Figure 1 for Generation of Multimedia Artifacts: An Extractive Summarization-based Approach

Figure 2 for Generation of Multimedia Artifacts: An Extractive Summarization-based Approach

Abstract:We explore methods for content selection and address the issue of coherence in the context of the generation of multimedia artifacts. We use audio and video to present two case studies: generation of film tributes, and lecture-driven science talks. For content selection, we use centrality-based and diversity-based summarization, along with topic analysis. To establish coherence, we use the emotional content of music, for film tributes, and ensure topic similarity between lectures and documentaries, for science talks. Composition techniques for the production of multimedia artifacts are addressed as a means of organizing content, in order to improve coherence. We discuss our results considering the above aspects.

* 7 pages, 2 figures

Via

Access Paper or Ask Questions

Privacy-Preserving Multi-Document Summarization

Aug 06, 2015

Luís Marujo, José Portêlo, Wang Ling, David Martins de Matos, João P. Neto, Anatole Gershman, Jaime Carbonell, Isabel Trancoso, Bhiksha Raj

Figure 1 for Privacy-Preserving Multi-Document Summarization

Figure 2 for Privacy-Preserving Multi-Document Summarization

Figure 3 for Privacy-Preserving Multi-Document Summarization

Figure 4 for Privacy-Preserving Multi-Document Summarization

Abstract:State-of-the-art extractive multi-document summarization systems are usually designed without any concern about privacy issues, meaning that all documents are open to third parties. In this paper we propose a privacy-preserving approach to multi-document summarization. Our approach enables other parties to obtain summaries without learning anything else about the original documents' content. We use a hashing scheme known as Secure Binary Embeddings to convert documents representation containing key phrases and bag-of-words into bit strings, allowing the computation of approximate distances, instead of exact ones. Our experiments indicate that our system yields similar results to its non-private counterpart on standard multi-document evaluation datasets.

* 4 pages, In Proceedings of 2nd ACM SIGIR Workshop on Privacy-Preserving Information Retrieval, August 2015. arXiv admin note: text overlap with arXiv:1407.5416

Via

Access Paper or Ask Questions

Extending a Single-Document Summarizer to Multi-Document: a Hierarchical Approach

Jul 10, 2015

Luís Marujo, Ricardo Ribeiro, David Martins de Matos, João P. Neto, Anatole Gershman, Jaime Carbonell

Figure 1 for Extending a Single-Document Summarizer to Multi-Document: a Hierarchical Approach

Figure 2 for Extending a Single-Document Summarizer to Multi-Document: a Hierarchical Approach

Figure 3 for Extending a Single-Document Summarizer to Multi-Document: a Hierarchical Approach

Abstract:The increasing amount of online content motivated the development of multi-document summarization methods. In this work, we explore straightforward approaches to extend single-document summarization methods to multi-document summarization. The proposed methods are based on the hierarchical combination of single-document summaries, and achieves state of the art results.

* 6 pages, Please cite: Proceedings of *SEM: the 4th Joint Conference on Lexical and Computational Semantics (bibtex: http://aclweb.org/anthology/S/S15/S15-1020.bib)

Via

Access Paper or Ask Questions

Towards Using Machine Translation Techniques to Induce Multilingual Lexica of Discourse Markers

Mar 31, 2015

António Lopes, David Martins de Matos, Vera Cabarrão, Ricardo Ribeiro, Helena Moniz, Isabel Trancoso, Ana Isabel Mata

Abstract:Discourse markers are universal linguistic events subject to language variation. Although an extensive literature has already reported language specific traits of these events, little has been said on their cross-language behavior and on building an inventory of multilingual lexica of discourse markers. This work describes new methods and approaches for the description, classification, and annotation of discourse markers in the specific domain of the Europarl corpus. The study of discourse markers in the context of translation is crucial due to the idiomatic nature of these structures. Multilingual lexica together with the functional analysis of such structures are useful tools for the hard task of translating discourse markers into possible equivalents from one language to another. Using Daniel Marcu's validated discourse markers for English, extracted from the Brown Corpus, our purpose is to build multilingual lexica of discourse markers for other languages, based on machine translation techniques. The major assumption in this study is that the usage of a discourse marker is independent of the language, i.e., the rhetorical function of a discourse marker in a sentence in one language is equivalent to the rhetorical function of the same discourse marker in another language.

* 6 pages

Via

Access Paper or Ask Questions

On the Application of Generic Summarization Algorithms to Music

Jun 18, 2014

Francisco Raposo, Ricardo Ribeiro, David Martins de Matos

Figure 1 for On the Application of Generic Summarization Algorithms to Music

Abstract:Several generic summarization algorithms were developed in the past and successfully applied in fields such as text and speech summarization. In this paper, we review and apply these algorithms to music. To evaluate this summarization's performance, we adopt an extrinsic approach: we compare a Fado Genre Classifier's performance using truncated contiguous clips against the summaries extracted with those algorithms on 2 different datasets. We show that Maximal Marginal Relevance (MMR), LexRank and Latent Semantic Analysis (LSA) all improve classification performance in both datasets used for testing.

* IEEE Signal Processing Letters, IEEE, vol. 22, n. 1, January 2015
* 12 pages, 1 table; Submitted to IEEE Signal Processing Letters

Via

Access Paper or Ask Questions