In natural language processing (NLP), lexical function is a concept to unambiguously represent semantic and syntactic features of words and phrases in text first crafted in the Meaning-Text Theory. Hierarchical classification of lexical functions involves organizing these features into a tree-like hierarchy of categories or labels. This is a challenging task as it requires a good understanding of the context and the relationships among words and phrases in text. It also needs large amounts of labeled data to train language models effectively. In this paper, we present a dataset of most frequent Spanish verb-noun collocations and sentences where they occur, each collocation is assigned to one of 37 lexical functions defined as classes for a hierarchical classification task. Each class represents a relation between the noun and the verb in a collocation involving their semantic and syntactic features. We combine the classes in a tree-based structure, and introduce classification objectives for each level of the structure. The dataset was created by dependency tree parsing and matching of the phrases in Spanish news. We provide baselines and data splits for each objective.
In this paper, we investigate the issue of hate speech by presenting a novel task of translating hate speech into non-hate speech text while preserving its meaning. As a case study, we use Spanish texts. We provide a dataset and several baselines as a starting point for further research in the task. We evaluated our baseline results using multiple metrics, including BLEU scores. The aim of this study is to contribute to the development of more effective methods for reducing the spread of hate speech in online communities.
Sarcasm detection is an essential task that can help identify the actual sentiment in user-generated data, such as discussion forums or tweets. Sarcasm is a sophisticated form of linguistic expression because its surface meaning usually contradicts its inner, deeper meaning. Such incongruity is the essential component of sarcasm, however, it makes sarcasm detection quite a challenging task. In this paper, we propose a model which incorporates emotion and sentiment features to capture the incongruity intrinsic to sarcasm. Moreover, we use CNN and pre-trained Transformer to capture context features. Our approach achieved state-of-the-art results on four datasets from social networking platforms and online media.