Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nicolò Felicioni

Fast Adversarial Attacks with Gradient Prediction

May 14, 2026

Kamil Ciosek, Aleksandr V. Petrov, Nicolò Felicioni, Konstantina Palla

Abstract:Generating adversarial examples at scale is a core primitive for robustness evaluation, adversarial training, and red-teaming, yet even "fast" attacks such as FGSM remain throughput-limited by the cost of a backward pass. We introduce a family of attacks that eliminates the backward pass by predicting the input gradient from forward-pass hidden states via a lightweight linear regression. The approach is motivated by a kernel view of neural networks and is exact in the Neural Tangent Kernel regime, while remaining effective for practical finite-width models. Empirically, our methods recover much of FGSM's attack performance while using only a small fraction of the time, corresponding to a $532\%$ increase in throughput. These results suggest gradient prediction as a simple and general route to significantly faster adversarial generation under realistic wall-clock constraints.

* 17 pages

Via

Access Paper or Ask Questions

A Bayesian Information-Theoretic Approach to Data Attribution

Apr 04, 2026

Dharmesh Tailor, Nicolò Felicioni, Kamil Ciosek

Abstract:Training Data Attribution (TDA) seeks to trace model predictions back to influential training examples, enhancing interpretability and safety. We formulate TDA as a Bayesian information-theoretic problem: subsets are scored by the information loss they induce - the entropy increase at a query when removed. This criterion credits examples for resolving predictive uncertainty rather than label noise. To scale to modern networks, we approximate information loss using a Gaussian Process surrogate built from tangent features. We show this aligns with classical influence scores for single-example attribution while promoting diversity for subsets. For even larger-scale retrieval, we relax to an information-gain objective and add a variance correction for scalable attribution in vector databases. Experiments show competitive performance on counterfactual sensitivity, ground-truth retrieval and coreset selection, showing that our method scales to modern architectures while bridging principled measures with practice.

* Accepted at the 29th International Conference on Artificial Intelligence and Statistics (AISTATS 2026)

Via

Access Paper or Ask Questions

Measuring Uncertainty Calibration

Dec 15, 2025

Kamil Ciosek, Nicolò Felicioni, Sina Ghiassian, Juan Elenter Litwin, Francesco Tonolini, David Gustaffson, Eva Garcia Martin, Carmen Barcena Gonzales, Raphaëlle Bertrand-Lalo

Abstract:We make two contributions to the problem of estimating the $L_1$ calibration error of a binary classifier from a finite dataset. First, we provide an upper bound for any classifier where the calibration function has bounded variation. Second, we provide a method of modifying any classifier so that its calibration error can be upper bounded efficiently without significantly impacting classifier performance and without any restrictive assumptions. All our results are non-asymptotic and distribution-free. We conclude by providing advice on how to measure calibration error in practice. Our methods yield practical procedures that can be run on real-world datasets with modest overhead.

* 28 pages

Via

Access Paper or Ask Questions

Linear Gradient Prediction with Control Variates

Nov 07, 2025

Kamil Ciosek, Nicolò Felicioni, Juan Elenter Litwin

Abstract:We propose a new way of training neural networks, with the goal of reducing training cost. Our method uses approximate predicted gradients instead of the full gradients that require an expensive backward pass. We derive a control-variate-based technique that ensures our updates are unbiased estimates of the true gradient. Moreover, we propose a novel way to derive a predictor for the gradient inspired by the theory of the Neural Tangent Kernel. We empirically show the efficacy of the technique on a vision transformer classification task.

Via

Access Paper or Ask Questions

Hallucination Detection on a Budget: Efficient Bayesian Estimation of Semantic Entropy

Apr 04, 2025

Kamil Ciosek, Nicolò Felicioni, Sina Ghiassian

Figure 1 for Hallucination Detection on a Budget: Efficient Bayesian Estimation of Semantic Entropy

Figure 2 for Hallucination Detection on a Budget: Efficient Bayesian Estimation of Semantic Entropy

Figure 3 for Hallucination Detection on a Budget: Efficient Bayesian Estimation of Semantic Entropy

Figure 4 for Hallucination Detection on a Budget: Efficient Bayesian Estimation of Semantic Entropy

Abstract:Detecting whether an LLM hallucinates is an important research challenge. One promising way of doing so is to estimate the semantic entropy (Farquhar et al., 2024) of the distribution of generated sequences. We propose a new algorithm for doing that, with two main advantages. First, due to us taking the Bayesian approach, we achieve a much better quality of semantic entropy estimates for a given budget of samples from the LLM. Second, we are able to tune the number of samples adaptively so that `harder' contexts receive more samples. We demonstrate empirically that our approach systematically beats the baselines, requiring only 59% of samples used by Farquhar et al. (2024) to achieve the same quality of hallucination detection as measured by AUROC. Moreover, quite counterintuitively, our estimator is useful even with just one sample from the LLM.

* 22 pages

Via

Access Paper or Ask Questions

AutoOPE: Automated Off-Policy Estimator Selection

Jun 26, 2024

Nicolò Felicioni, Michael Benigni, Maurizio Ferrari Dacrema

Figure 1 for AutoOPE: Automated Off-Policy Estimator Selection

Figure 2 for AutoOPE: Automated Off-Policy Estimator Selection

Figure 3 for AutoOPE: Automated Off-Policy Estimator Selection

Figure 4 for AutoOPE: Automated Off-Policy Estimator Selection

Abstract:The Off-Policy Evaluation (OPE) problem consists of evaluating the performance of counterfactual policies with data collected by another one. This problem is of utmost importance for various application domains, e.g., recommendation systems, medical treatments, and many others. To solve the OPE problem, we resort to estimators, which aim to estimate in the most accurate way possible the performance that the counterfactual policies would have had if they were deployed in place of the logging policy. In the literature, several estimators have been developed, all with different characteristics and theoretical guarantees. Therefore, there is no dominant estimator, and each estimator may be the best one for different OPE problems, depending on the characteristics of the dataset at hand. While the selection of the estimator is a crucial choice for an accurate OPE, this problem has been widely overlooked in the literature. We propose an automated data-driven OPE estimator selection method based on machine learning. In particular, the core idea we propose in this paper is to create several synthetic OPE tasks and use a machine learning model trained to predict the best estimator for those synthetic tasks. We empirically show how our method is able to generalize to unseen tasks and make a better estimator selection compared to a baseline method on several real-world datasets, with a computational cost significantly lower than the one of the baseline.

Via

Access Paper or Ask Questions

On the Importance of Uncertainty in Decision-Making with Large Language Models

Apr 03, 2024

Nicolò Felicioni, Lucas Maystre, Sina Ghiassian, Kamil Ciosek

Figure 1 for On the Importance of Uncertainty in Decision-Making with Large Language Models

Figure 2 for On the Importance of Uncertainty in Decision-Making with Large Language Models

Figure 3 for On the Importance of Uncertainty in Decision-Making with Large Language Models

Figure 4 for On the Importance of Uncertainty in Decision-Making with Large Language Models

Abstract:We investigate the role of uncertainty in decision-making problems with natural language as input. For such tasks, using Large Language Models as agents has become the norm. However, none of the recent approaches employ any additional phase for estimating the uncertainty the agent has about the world during the decision-making task. We focus on a fundamental decision-making framework with natural language as input, which is the one of contextual bandits, where the context information consists of text. As a representative of the approaches with no uncertainty estimation, we consider an LLM bandit with a greedy policy, which picks the action corresponding to the largest predicted reward. We compare this baseline to LLM bandits that make active use of uncertainty estimation by integrating the uncertainty in a Thompson Sampling policy. We employ different techniques for uncertainty estimation, such as Laplace Approximation, Dropout, and Epinets. We empirically show on real-world data that the greedy policy performs worse than the Thompson Sampling policies. These findings suggest that, while overlooked in the LLM literature, uncertainty plays a fundamental role in bandit tasks with LLMs.

* 12 pages of main content, 25 pages with references and appendix

Via

Access Paper or Ask Questions

Measuring the User Satisfaction in a Recommendation Interface with Multiple Carousels

May 14, 2021

Nicolò Felicioni, Maurizio Ferrari Dacrema, Paolo Cremonesi

Figure 1 for Measuring the User Satisfaction in a Recommendation Interface with Multiple Carousels

Figure 2 for Measuring the User Satisfaction in a Recommendation Interface with Multiple Carousels

Figure 3 for Measuring the User Satisfaction in a Recommendation Interface with Multiple Carousels

Figure 4 for Measuring the User Satisfaction in a Recommendation Interface with Multiple Carousels

Abstract:It is common for video-on-demand and music streaming services to adopt a user interface composed of several recommendation lists, i.e. widgets or swipeable carousels, each generated according to a specific criterion or algorithm (e.g. most recent, top popular, recommended for you, editors' choice, etc.). Selecting the appropriate combination of carousel has significant impact on user satisfaction. A crucial aspect of this user interface is that to measure the relevance a new carousel for the user it is not sufficient to account solely for its individual quality. Instead, it should be considered that other carousels will already be present in the interface. This is not considered by traditional evaluation protocols for recommenders systems, in which each carousel is evaluated in isolation, regardless of (i) which other carousels are displayed to the user and (ii) the relative position of the carousel with respect to other carousels. Hence, we propose a two-dimensional evaluation protocol for a carousel setting that will measure the quality of a recommendation carousel based on how much it improves upon the quality of an already available set of carousels. Our evaluation protocol takes into account also the position bias, i.e. users do not explore the carousels sequentially, but rather concentrate on the top-left corner of the screen. We report experiments on the movie domain and notice that under a carousel setting the definition of which criteria has to be preferred to generate a list of recommended items changes with respect to what is commonly understood.

* ACM International Conference on Interactive Media Experiences (IMX '21), June 21--23, 2021, Virtual Event, NY, USA

Via

Access Paper or Ask Questions

A Methodology for the Offline Evaluation of Recommender Systems in a User Interface with Multiple Carousels

May 13, 2021

Nicolò Felicioni, Maurizio Ferrari Dacrema, Paolo Cremonesi

Figure 1 for A Methodology for the Offline Evaluation of Recommender Systems in a User Interface with Multiple Carousels

Figure 2 for A Methodology for the Offline Evaluation of Recommender Systems in a User Interface with Multiple Carousels

Figure 3 for A Methodology for the Offline Evaluation of Recommender Systems in a User Interface with Multiple Carousels

Abstract:Many video-on-demand and music streaming services provide the user with a page consisting of several recommendation lists, i.e. widgets or swipeable carousels, each built with a specific criterion (e.g. most recent, TV series, etc.). Finding efficient strategies to select which carousels to display is an active research topic of great industrial interest. In this setting, the overall quality of the recommendations of a new algorithm cannot be assessed by measuring solely its individual recommendation quality. Rather, it should be evaluated in a context where other recommendation lists are already available, to account for how they complement each other. This is not considered by traditional offline evaluation protocols. Hence, we propose an offline evaluation protocol for a carousel setting in which the recommendation quality of a model is measured by how much it improves upon that of an already available set of carousels. We report experiments on publicly available datasets on the movie domain and notice that under a carousel setting the ranking of the algorithms change. In particular, when a SLIM carousel is available, matrix factorization models tend to be preferred, while item-based models are penalized. We also propose to extend ranking metrics to the two-dimensional carousel layout in order to account for a known position bias, i.e. users will not explore the lists sequentially, but rather concentrate on the top-left corner of the screen.

* Adjunct Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization (UMAP '21 Adjunct), June 21--25, 2021, Utrecht, Netherlands

Via

Access Paper or Ask Questions