ESI
Abstract:Test collections are essential for evaluating retrieval and re-ranking models. However, constructing such collections is challenging due to the high cost of manual annotation, particularly in specialized domains like Algerian legal texts, where high-quality corpora and relevance judgments are scarce. To address this limitation, we propose STCALIR, a framework for generating semi-synthetic test collections directly from raw legal documents. The pipeline follows the Cranfield paradigm, maintaining its core components of topics, corpus, and relevance judgments, while significantly reducing manual effort through automated multi-stage retrieval and filtering, achieving a 99% reduction in annotation workload. We validate STCALIR using the Mr. TyDi benchmark, demonstrating that the resulting semi-synthetic relevance judgments yield retrieval effectiveness comparable to human-annotated evaluations (Hit@10 \approx 0.785). Furthermore, system-level rankings derived from these labels exhibit strong concordance with human-based evaluations, as measured by Kendall's τ (0.89) and Spearman's \r{ho} (0.92). Overall, STCALIR offers a reproducible and cost-efficient solution for constructing reliable test collections in low-resource legal domains.


Abstract:Opinion mining and sentiment analysis in social media is a research issue having a great interest in the scientific community. However, before begin this analysis, we are faced with a set of problems. In particular, the problem of the richness of languages and dialects within these media. To address this problem, we propose in this paper an approach of construction and implementation of Syntactic analyzer named ASDA. This tool represents a parser for the Algerian dialect that label the terms of a given corpus. Thus, we construct a labeling table containing for each term its stem, different prefixes and suffixes, allowing us to determine the different grammatical parts a sort of POS tagging. This labeling will serve us later in the semantic processing of the Algerian dialect, like the automatic translation of this dialect or sentiment analysis