Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Athanasios Davvetas

AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems

Mar 10, 2026

Athanasios Davvetas, Michael Papademas, Xenia Ziouvelou, Vangelis Karkaletsis

Abstract:The rapid rollout of AI in heterogeneous public and societal sectors has subsequently escalated the need for compliance with regulatory standards and frameworks. The EU AI Act has emerged as a landmark in the regulatory landscape. The development of solutions that elicit the level of AI systems' compliance with such standards is often limited by the lack of resources, hindering the semi-automated or automated evaluation of their performance. This generates the need for manual work, which is often error-prone, resource-limited or limited to cases not clearly described by the regulation. This paper presents an open, transparent, and reproducible method of creating a resource that facilitates the evaluation of NLP models with a strong focus on RAG systems. We have developed a dataset that contain the tasks of risk-level classification, article retrieval, obligation generation, and question-answering for the EU AI Act. The dataset files are in a machine-to-machine appropriate format. To generate the files, we utilise domain knowledge as an exegetical basis, combining with the processing and reasoning power of large language models to generate scenarios along with the respective tasks. Our methodology demonstrates a way to harness language models for grounded generation with high document relevancy. Besides, we overcome limitations such as navigating the decision boundaries of risk-levels that are not explicitly defined within the EU AI Act, such as limited and minimal cases. Finally, we demonstrate our dataset's effectiveness by evaluating a RAG-based solution that reaches 0.87 and 0.85 F1-score for prohibited and high-risk scenarios.

* 10 pages, 1 figure, 4 tables, 2 equations

Via

Access Paper or Ask Questions

TAI Scan Tool: A RAG-Based Tool With Minimalistic Input for Trustworthy AI Self-Assessment

Jul 23, 2025

Athanasios Davvetas, Xenia Ziouvelou, Ypatia Dami, Alexis Kaponis, Konstantina Giouvanopoulou, Michael Papademas

Figure 1 for TAI Scan Tool: A RAG-Based Tool With Minimalistic Input for Trustworthy AI Self-Assessment

Figure 2 for TAI Scan Tool: A RAG-Based Tool With Minimalistic Input for Trustworthy AI Self-Assessment

Figure 3 for TAI Scan Tool: A RAG-Based Tool With Minimalistic Input for Trustworthy AI Self-Assessment

Figure 4 for TAI Scan Tool: A RAG-Based Tool With Minimalistic Input for Trustworthy AI Self-Assessment

Abstract:This paper introduces the TAI Scan Tool, a RAG-based TAI self-assessment tool with minimalistic input. The current version of the tool supports the legal TAI assessment, with a particular emphasis on facilitating compliance with the AI Act. It involves a two-step approach with a pre-screening and an assessment phase. The assessment output of the system includes insight regarding the risk-level of the AI system according to the AI Act, while at the same time retrieving relevant articles to aid with compliance and notify on their obligations. Our qualitative evaluation using use-case scenarios yields promising results, correctly predicting risk levels while retrieving relevant articles across three distinct semantic groups. Furthermore, interpretation of results shows that the tool's reasoning relies on comparison with the setting of high-risk systems, a behaviour attributed to their deployment requiring careful consideration, and therefore frequently presented within the AI Act.

* 9 pages, 1 figure, 4 tables

Via

Access Paper or Ask Questions

WeLa-VAE: Learning Alternative Disentangled Representations Using Weak Labels

Aug 22, 2020

Vasilis Margonis, Athanasios Davvetas, Iraklis A. Klampanos

Figure 1 for WeLa-VAE: Learning Alternative Disentangled Representations Using Weak Labels

Figure 2 for WeLa-VAE: Learning Alternative Disentangled Representations Using Weak Labels

Figure 3 for WeLa-VAE: Learning Alternative Disentangled Representations Using Weak Labels

Figure 4 for WeLa-VAE: Learning Alternative Disentangled Representations Using Weak Labels

Abstract:Learning disentangled representations without supervision or inductive biases, often leads to non-interpretable or undesirable representations. On the other hand, strict supervision requires detailed knowledge of the true generative factors, which is not always possible. In this paper, we consider weak supervision by means of high-level labels that are not assumed to be explicitly related to the ground truth factors. Such labels, while being easier to acquire, can also be used as inductive biases for algorithms to learn more interpretable or alternative disentangled representations. To this end, we propose WeLa-VAE, a variational inference framework where observations and labels share the same latent variables, which involves the maximization of a modified variational lower bound and total correlation regularization. Our method is a generalization of TCVAE, adding only one extra hyperparameter. We experiment on a dataset generated by Cartesian coordinates and we show that, while a TCVAE learns a factorized Cartesian representation, given weak labels of distance and angle, WeLa-VAE is able to learn and disentangle a polar representation. This is achieved without the need of refined labels or having to adjust the number of layers, the optimization parameters, or the total correlation hyperparameter.

Via

Access Paper or Ask Questions

Unsupervised Severe Weather Detection Via Joint Representation Learning Over Textual and Weather Data

May 14, 2020

Athanasios Davvetas, Iraklis A. Klampanos

Figure 1 for Unsupervised Severe Weather Detection Via Joint Representation Learning Over Textual and Weather Data

Figure 2 for Unsupervised Severe Weather Detection Via Joint Representation Learning Over Textual and Weather Data

Figure 3 for Unsupervised Severe Weather Detection Via Joint Representation Learning Over Textual and Weather Data

Abstract:When observing a phenomenon, severe cases or anomalies are often characterised by deviation from the expected data distribution. However, non-deviating data samples may also implicitly lead to severe outcomes. In the case of unsupervised severe weather detection, these data samples can lead to mispredictions, since the predictors of severe weather are often not directly observed as features. We posit that incorporating external or auxiliary information, such as the outcome of an external task or an observation, can improve the decision boundaries of an unsupervised detection algorithm. In this paper, we increase the effectiveness of a clustering method to detect cases of severe weather by learning augmented and linearly separable latent representations.We evaluate our solution against three individual cases of severe weather, namely windstorms, floods and tornado outbreaks.

Via

Access Paper or Ask Questions

Learning Improved Representations by Transferring Incomplete Evidence Across Heterogeneous Tasks

Dec 22, 2019

Athanasios Davvetas, Iraklis A. Klampanos

Figure 1 for Learning Improved Representations by Transferring Incomplete Evidence Across Heterogeneous Tasks

Figure 2 for Learning Improved Representations by Transferring Incomplete Evidence Across Heterogeneous Tasks

Figure 3 for Learning Improved Representations by Transferring Incomplete Evidence Across Heterogeneous Tasks

Figure 4 for Learning Improved Representations by Transferring Incomplete Evidence Across Heterogeneous Tasks

Abstract:Acquiring ground truth labels for unlabelled data can be a costly procedure, since it often requires manual labour that is error-prone. Consequently, the available amount of labelled data is increasingly reduced due to the limitations of manual data labelling. It is possible to increase the amount of labelled data samples by performing automated labelling or crowd-sourcing the annotation procedure. However, they often introduce noise or uncertainty in the labelset, that leads to decreased performance of supervised deep learning methods. On the other hand, weak supervision methods remain robust during noisy labelsets or can be effective even with low amounts of labelled data. In this paper we evaluate the effectiveness of a representation learning method that uses external categorical evidence called "Evidence Transfer", against low amount of corresponding evidence termed as incomplete evidence. Evidence transfer is a robust solution against external unknown categorical evidence that can introduce noise or uncertainty. In our experimental evaluation, evidence transfer proves to be effective and robust against different levels of incompleteness, for two types of incomplete evidence.

* 8 pages, 2 figures, 4 tables

Via

Access Paper or Ask Questions

Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence

Nov 09, 2018

Athanasios Davvetas, Iraklis A. Klampanos, Vangelis Karkaletsis

Figure 1 for Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence

Figure 2 for Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence

Figure 3 for Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence

Figure 4 for Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence

Abstract:In this paper we introduce evidence transfer for clustering, a deep learning method that can incrementally manipulate the latent representations of an autoencoder, according to external categorical evidence, in order to improve a clustering outcome. It is deployed on a baseline solution to reduce the cross entropy between the external evidence and an extension of the latent space. By evidence transfer we define the process by which the categorical outcome of an external, auxiliary task is exploited to improve a primary task, in this case representation learning for clustering. Our proposed method makes no assumptions regarding the categorical evidence presented, nor the structure of the latent space. We compare our method, against the baseline solution by performing k-means clustering before and after its deployment. Experiments with three different kinds of evidence show that our method effectively manipulates the latent representations when introduced with real corresponding evidence, while remaining robust when presented with low quality evidence.

Via

Access Paper or Ask Questions

ANNETT-O: An Ontology for Describing Artificial Neural Network Evaluation, Topology and Training

May 10, 2018

Iraklis A. Klampanos, Athanasios Davvetas, Antonis Koukourikos, Vangelis Karkaletsis

Figure 1 for ANNETT-O: An Ontology for Describing Artificial Neural Network Evaluation, Topology and Training

Figure 2 for ANNETT-O: An Ontology for Describing Artificial Neural Network Evaluation, Topology and Training

Figure 3 for ANNETT-O: An Ontology for Describing Artificial Neural Network Evaluation, Topology and Training

Figure 4 for ANNETT-O: An Ontology for Describing Artificial Neural Network Evaluation, Topology and Training

Abstract:Deep learning models, while effective and versatile, are becoming increasingly complex, often including multiple overlapping networks of arbitrary depths, multiple objectives and non-intuitive training methodologies. This makes it increasingly difficult for researchers and practitioners to design, train and understand them. In this paper we present ANNETT-O, a much-needed, generic and computer-actionable vocabulary for researchers and practitioners to describe their deep learning configurations, training procedures and experiments. The proposed ontology focuses on topological, training and evaluation aspects of complex deep neural configurations, while keeping peripheral entities more succinct. Knowledge bases implementing ANNETT-O can support a wide variety of queries, providing relevant insights to users. In addition to a detailed description of the ontology, we demonstrate its suitability to the task via a number of hypothetical use-cases of increasing complexity.

Via

Access Paper or Ask Questions