Cointegration is an important topic for time-series, and describes a relationship between two series in which a linear combination is stationary. Classically, the test for cointegration is based on a two stage process in which first the linear relation between the series is estimated by Ordinary Least Squares. Subsequently a unit root test is performed on the residuals. A well-known deficiency of this classical approach is that it can lead to erroneous conclusions about the presence of cointegration. As an alternative, we present a framework for estimating whether cointegration exists using Bayesian inference which is empirically superior to the classical approach. Finally, we apply our technique to model segmented cointegration in which cointegration may exist only for limited time. In contrast to previous approaches our model makes no restriction on the number of possible cointegration segments.
Knowledge-aware methods have boosted a range of Natural Language Processing applications over the last decades. With the gathered momentum, knowledge recently has been pumped into enormous attention in document summarization research. Previous works proved that knowledge-embedded document summarizers excel at generating superior digests, especially in terms of informativeness, coherence, and fact consistency. This paper pursues to present the first systematic survey for the state-of-the-art methodologies that embed knowledge into document summarizers. Particularly, we propose novel taxonomies to recapitulate knowledge and knowledge embeddings under the document summarization view. We further explore how embeddings are generated in learning architectures of document summarization models, especially in deep learning models. At last, we discuss the challenges of this topic and future directions.
How can an end-user provide feedback if a deployed structured prediction model generates inconsistent output, ignoring the structural complexity of human language? This is an emerging topic with recent progress in synthetic or constrained settings, and the next big leap would require testing and tuning models in real-world settings. We present a new dataset, Interscript, containing user feedback on a deployed model that generates complex everyday tasks. Interscript contains 8,466 data points -- the input is a possibly erroneous script and a user feedback, and the output is a modified script. We posit two use-cases of \ours that might significantly advance the state-of-the-art in interactive learning. The dataset is available at: https://github.com/allenai/interscript.
Keyphrase extraction is the task of finding several interesting phrases in a text document, which provide a list of the main topics within the document. Most existing graph-based models use co-occurrence links as cohesion indicators to model the relationship of syntactic elements. However, a word may have different forms of expression within the document, and may have several synonyms as well. Simply using co-occurrence information cannot capture this information. In this paper, we enhance the graph-based ranking model by leveraging word embeddings as background knowledge to add semantic information to the inter-word graph. Our approach is evaluated on established benchmark datasets and empirical results show that the word embedding neighborhood information improves the model performance.
Voice Conversion (VC) for unseen speakers, also known as zero-shot VC, is an attractive topic due to its usefulness in real use-case scenarios. Recent work in this area made progress with disentanglement methods that separate utterance content and speaker characteristics. Although crucial, extracting disentangled prosody characteristics for unseen speakers remains an open issue. In this paper, we propose a novel self-supervised approach to effectively learn the prosody characteristics. Then, we use the learned prosodic representations to train our VC model for zero-shot conversion. Our evaluation demonstrates that we can efficiently extract disentangled prosody representation. Moreover, we show improved performance compared to the state-of-the-art zero-shot VC models.
Speech emotion recognition is a challenging and important research topic that plays a critical role in human-computer interaction. Multimodal inputs can improve the performance as more emotional information is used for recognition. However, existing studies learnt all the information in the sample while only a small portion of it is about emotion. Moreover, under the multimodal framework, the interaction between different modalities is shallow and insufficient. In this paper, a keysparse Transformer is proposed for efficient SER by only focusing on emotion related information. Furthermore, a cascaded cross-attention block, which is specially designed for multimodal framework, is introduced to achieve deep interaction between different modalities. The proposed method is evaluated by IEMOCAP corpus and the experimental results show that the proposed method gives better performance than the state-of-theart approaches.
To detect and segment objects in images based on their content is one of the most active topics in the field of computer vision. Nowadays, this problem can be addressed using Deep Learning architectures such as Faster R-CNN or YOLO, among others. In this paper, we study the behaviour of different configurations of RetinaNet, Faster R-CNN and Mask R-CNN presented in Detectron2. First, we evaluate qualitatively and quantitatively (AP) the performance of the pre-trained models on KITTI-MOTS and MOTSChallenge datasets. We observe a significant improvement in performance after fine-tuning these models on the datasets of interest and optimizing hyperparameters. Finally, we run inference in unusual situations using out of context datasets, and present interesting results that help us understanding better the networks.
To retrieve images based on their content is one of the most studied topics in the field of computer vision. Nowadays, this problem can be addressed using modern techniques such as feature extraction using machine learning, but over the years different classical methods have been developed. In this paper, we implement a query by example retrieval system for finding paintings in a museum image collection using classic computer vision techniques. Specifically, we study the performance of the color, texture, text and feature descriptors in datasets with different perturbations in the images: noise, overlapping text boxes, color corruption and rotation. We evaluate each of the cases using the Mean Average Precision (MAP) metric, and we obtain results that vary between 0.5 and 1.0 depending on the problem conditions.
A musical performance renders an acoustic realization of a musical score or other representation of a composition. Different performances of the same composition may vary in terms of performance parameters such as timing or dynamics, and these variations may have a major impact on how a listener perceives the music. The analysis of music performance has traditionally been a peripheral topic for the MIR research community, where often a single audio recording is used as representative of a musical work. This paper surveys the field of Music Performance Analysis (MPA) from several perspectives including the measurement of performance parameters, the relation of those parameters to the actions and intentions of a performer or perceptual effects on a listener, and finally the assessment of musical performance. This paper also discusses MPA as it relates to MIR, pointing out opportunities for collaboration and future research in both areas.
A number of exciting advances have been made in automated fact-checking thanks to increasingly larger datasets and more powerful systems, leading to improvements in the complexity of claims which can be accurately fact-checked. However, despite these advances, there are still desirable functionalities missing from the fact-checking pipeline. In this survey, we focus on the explanation functionality -- that is fact-checking systems providing reasons for their predictions. We summarize existing methods for explaining the predictions of fact-checking systems and we explore trends in this topic. Further, we consider what makes for good explanations in this specific domain through a comparative analysis of existing fact-checking explanations against some desirable properties. Finally, we propose further research directions for generating fact-checking explanations, and describe how these may lead to improvements in the research area.