Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Leonardo Rigutini

SLIMER-IT: Zero-Shot NER on Italian Language

Sep 24, 2024

Andrew Zamai, Leonardo Rigutini, Marco Maggini, Andrea Zugarini

Figure 1 for SLIMER-IT: Zero-Shot NER on Italian Language

Figure 2 for SLIMER-IT: Zero-Shot NER on Italian Language

Figure 3 for SLIMER-IT: Zero-Shot NER on Italian Language

Figure 4 for SLIMER-IT: Zero-Shot NER on Italian Language

Abstract:Traditional approaches to Named Entity Recognition (NER) frame the task into a BIO sequence labeling problem. Although these systems often excel in the downstream task at hand, they require extensive annotated data and struggle to generalize to out-of-distribution input domains and unseen entity types. On the contrary, Large Language Models (LLMs) have demonstrated strong zero-shot capabilities. While several works address Zero-Shot NER in English, little has been done in other languages. In this paper, we define an evaluation framework for Zero-Shot NER, applying it to the Italian language. Furthermore, we introduce SLIMER-IT, the Italian version of SLIMER, an instruction-tuning approach for zero-shot NER leveraging prompts enriched with definition and guidelines. Comparisons with other state-of-the-art models, demonstrate the superiority of SLIMER-IT on never-seen-before entity tags.

Via

Access Paper or Ask Questions

Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines for Zero-Shot NER

Jul 02, 2024

Andrew Zamai, Andrea Zugarini, Leonardo Rigutini, Marco Ernandes, Marco Maggini

Abstract:Recently, several specialized instruction-tuned Large Language Models (LLMs) for Named Entity Recognition (NER) have emerged. Compared to traditional NER approaches, these models have strong generalization capabilities. Existing LLMs mainly focus on zero-shot NER in out-of-domain distributions, being fine-tuned on an extensive number of entity classes that often highly or completely overlap with test sets. In this work instead, we propose SLIMER, an approach designed to tackle never-seen-before named entity tags by instructing the model on fewer examples, and by leveraging a prompt enriched with definition and guidelines. Experiments demonstrate that definition and guidelines yield better performance, faster and more robust learning, particularly when labelling unseen Named Entities. Furthermore, SLIMER performs comparably to state-of-the-art approaches in out-of-domain zero-shot NER, while being trained on a reduced tag set.

Via

Access Paper or Ask Questions

A Turkish Educational Crossword Puzzle Generator

May 15, 2024

Kamyar Zeinalipour, Yusuf Gökberk Keptiğ, Marco Maggini, Leonardo Rigutini, Marco Gori

Figure 1 for A Turkish Educational Crossword Puzzle Generator

Figure 2 for A Turkish Educational Crossword Puzzle Generator

Figure 3 for A Turkish Educational Crossword Puzzle Generator

Figure 4 for A Turkish Educational Crossword Puzzle Generator

Abstract:This paper introduces the first Turkish crossword puzzle generator designed to leverage the capabilities of large language models (LLMs) for educational purposes. In this work, we introduced two specially created datasets: one with over 180,000 unique answer-clue pairs for generating relevant clues from the given answer, and another with over 35,000 samples containing text, answer, category, and clue data, aimed at producing clues for specific texts and keywords within certain categories. Beyond entertainment, this generator emerges as an interactive educational tool that enhances memory, vocabulary, and problem-solving skills. It's a notable step in AI-enhanced education, merging game-like engagement with learning for Turkish and setting new standards for interactive, intelligent learning tools in Turkish.

* This paper has been accepted for presentation at AIED2024 LBR

Via

Access Paper or Ask Questions

Clue-Instruct: Text-Based Clue Generation for Educational Crossword Puzzles

Apr 09, 2024

Andrea Zugarini, Kamyar Zeinalipour, Surya Sai Kadali, Marco Maggini, Marco Gori, Leonardo Rigutini

Abstract:Crossword puzzles are popular linguistic games often used as tools to engage students in learning. Educational crosswords are characterized by less cryptic and more factual clues that distinguish them from traditional crossword puzzles. Despite there exist several publicly available clue-answer pair databases for traditional crosswords, educational clue-answer pairs datasets are missing. In this article, we propose a methodology to build educational clue generation datasets that can be used to instruct Large Language Models (LLMs). By gathering from Wikipedia pages informative content associated with relevant keywords, we use Large Language Models to automatically generate pedagogical clues related to the given input keyword and its context. With such an approach, we created clue-instruct, a dataset containing 44,075 unique examples with text-keyword pairs associated with three distinct crossword clues. We used clue-instruct to instruct different LLMs to generate educational clues from a given input content and keyword. Both human and automatic evaluations confirmed the quality of the generated clues, thus validating the effectiveness of our approach.

Via

Access Paper or Ask Questions

Enhancing Modern Supervised Word Sense Disambiguation Models by Semantic Lexical Resources

Feb 20, 2024

Stefano Melacci, Achille Globo, Leonardo Rigutini

Figure 1 for Enhancing Modern Supervised Word Sense Disambiguation Models by Semantic Lexical Resources

Figure 2 for Enhancing Modern Supervised Word Sense Disambiguation Models by Semantic Lexical Resources

Figure 3 for Enhancing Modern Supervised Word Sense Disambiguation Models by Semantic Lexical Resources

Figure 4 for Enhancing Modern Supervised Word Sense Disambiguation Models by Semantic Lexical Resources

Abstract:Supervised models for Word Sense Disambiguation (WSD) currently yield to state-of-the-art results in the most popular benchmarks. Despite the recent introduction of Word Embeddings and Recurrent Neural Networks to design powerful context-related features, the interest in improving WSD models using Semantic Lexical Resources (SLRs) is mostly restricted to knowledge-based approaches. In this paper, we enhance "modern" supervised WSD models exploiting two popular SLRs: WordNet and WordNet Domains. We propose an effective way to introduce semantic features into the classifiers, and we consider using the SLR structure to augment the training data. We study the effect of different types of semantic features, investigating their interaction with local contexts encoded by means of mixtures of Word Embeddings or Recurrent Neural Networks, and we extend the proposed model into a novel multi-layer architecture for WSD. A detailed experimental comparison in the recent Unified Evaluation Framework (Raganato et al., 2017) shows that the proposed approach leads to supervised models that compare favourably with the state-of-the art.

* Proceedings of The 11th International Conference on Language Resources and Evaluation (LREC 2018)
* The 11th International Conference on Language Resources and Evaluation (LREC 2018)

Via

Access Paper or Ask Questions

Multitask Kernel-based Learning with Logic Constraints

Feb 16, 2024

Michelangelo Diligenti, Marco Gori, Marco Maggini, Leonardo Rigutini

Figure 1 for Multitask Kernel-based Learning with Logic Constraints

Figure 2 for Multitask Kernel-based Learning with Logic Constraints

Figure 3 for Multitask Kernel-based Learning with Logic Constraints

Abstract:This paper presents a general framework to integrate prior knowledge in the form of logic constraints among a set of task functions into kernel machines. The logic propositions provide a partial representation of the environment, in which the learner operates, that is exploited by the learning algorithm together with the information available in the supervised examples. In particular, we consider a multi-task learning scheme, where multiple unary predicates on the feature space are to be learned by kernel machines and a higher level abstract representation consists of logic clauses on these predicates, known to hold for any input. A general approach is presented to convert the logic clauses into a continuous implementation, that processes the outputs computed by the kernel-based predicates. The learning task is formulated as a primal optimization problem of a loss function that combines a term measuring the fitting of the supervised examples, a regularization term, and a penalty term that enforces the constraints on both supervised and unsupervised examples. The proposed semi-supervised learning framework is particularly suited for learning in high dimensionality feature spaces, where the supervised training examples tend to be sparse and generalization difficult. Unlike for standard kernel machines, the cost function to optimize is not generally guaranteed to be convex. However, the experimental results show that it is still possible to find good solutions using a two stage learning schema, in which first the supervised examples are learned until convergence and then the logic constraints are forced. Some promising experimental results on artificial multi-task learning tasks are reported, showing how the classification accuracy can be effectively improved by exploiting the a priori rules and the unsupervised examples.

* Proceedings of the 19th European Conference on Artificial Intelligence (ECAI 2010)
* The 19th European Conference on Artificial Intelligence (ECAI 2010)

Via

Access Paper or Ask Questions

A novel integrated industrial approach with cobots in the age of industry 4.0 through conversational interaction and computer vision

Feb 16, 2024

Andrea Pazienza, Nicola Macchiarulo, Felice Vitulano, Antonio Fiorentini, Marco Cammisa, Leonardo Rigutini, Ernesto Di Iorio, Achille Globo, Antonio Trevisi

Abstract:From robots that replace workers to robots that serve as helpful colleagues, the field of robotic automation is experiencing a new trend that represents a huge challenge for component manufacturers. The contribution starts from an innovative vision that sees an ever closer collaboration between Cobot, able to do a specific physical job with precision, the AI world, able to analyze information and support the decision-making process, and the man able to have a strategic vision of the future.

* Proceedings of the 6th Italian Conference on Computational Linguistics (CLiC-it 2019)

Via

Access Paper or Ask Questions

Neural paraphrasing by automatically crawled and aligned sentence pairs

Feb 16, 2024

Achille Globo, Antonio Trevisi, Andrea Zugarini, Leonardo Rigutini, Marco Maggini, Stefano Melacci

Figure 1 for Neural paraphrasing by automatically crawled and aligned sentence pairs

Figure 2 for Neural paraphrasing by automatically crawled and aligned sentence pairs

Abstract:Paraphrasing is the task of re-writing an input text using other words, without altering the meaning of the original content. Conversational systems can exploit automatic paraphrasing to make the conversation more natural, e.g., talking about a certain topic using different paraphrases in different time instants. Recently, the task of automatically generating paraphrases has been approached in the context of Natural Language Generation (NLG). While many existing systems simply consist in rule-based models, the recent success of the Deep Neural Networks in several NLG tasks naturally suggests the possibility of exploiting such networks for generating paraphrases. However, the main obstacle toward neural-network-based paraphrasing is the lack of large datasets with aligned pairs of sentences and paraphrases, that are needed to efficiently train the neural models. In this paper we present a method for the automatic generation of large aligned corpora, that is based on the assumption that news and blog websites talk about the same events using different narrative styles. We propose a similarity search procedure with linguistic constraints that, given a reference sentence, is able to locate the most similar candidate paraphrases out from millions of indexed sentences. The data generation process is evaluated in the case of the Italian language, performing experiments using pointer-based deep neural architectures.

* Proceedings of The 6th International Conference on Social Networks Analysis, Management and Security (SNAMS 2019)
* The 6th International Conference on Social Networks Analysis, Management and Security (SNAMS 2019)

Via

Access Paper or Ask Questions

Data Augmentation and Transfer Learning Approaches Applied to Facial Expressions Recognition

Feb 15, 2024

Enrico Randellini, Leonardo Rigutini, Claudio Sacca'

Abstract:The face expression is the first thing we pay attention to when we want to understand a person's state of mind. Thus, the ability to recognize facial expressions in an automatic way is a very interesting research field. In this paper, because the small size of available training datasets, we propose a novel data augmentation technique that improves the performances in the recognition task. We apply geometrical transformations and build from scratch GAN models able to generate new synthetic images for each emotion type. Thus, on the augmented datasets we fine tune pretrained convolutional neural networks with different architectures. To measure the generalization ability of the models, we apply extra-database protocol approach, namely we train models on the augmented versions of training dataset and test them on two different databases. The combination of these techniques allows to reach average accuracy values of the order of 85\% for the InceptionResNetV2 model.

* Proceeding of the 11th International Conference on Artificial Intelligence, Soft Computing and Applications (AIAA 2021)
* The 11th International Conference on Artificial Intelligence, Soft Computing and Applications (AIAA 2021)

Via

Access Paper or Ask Questions

Fast Vocabulary Transfer for Language Model Compression

Feb 15, 2024

Leonidas Gee, Andrea Zugarini, Leonardo Rigutini, Paolo Torroni

Figure 1 for Fast Vocabulary Transfer for Language Model Compression

Figure 2 for Fast Vocabulary Transfer for Language Model Compression

Figure 3 for Fast Vocabulary Transfer for Language Model Compression

Figure 4 for Fast Vocabulary Transfer for Language Model Compression

Abstract:Real-world business applications require a trade-off between language model performance and size. We propose a new method for model compression that relies on vocabulary transfer. We evaluate the method on various vertical domains and downstream tasks. Our results indicate that vocabulary transfer can be effectively used in combination with other compression techniques, yielding a significant reduction in model size and inference time while marginally compromising on performance.

* Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022): Industry Track
* The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)

Via

Access Paper or Ask Questions