Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Edgar Dobriban

Statistical Methods in Generative AI

Sep 08, 2025

Edgar Dobriban

Abstract:Generative Artificial Intelligence is emerging as an important technology, promising to be transformative in many areas. At the same time, generative AI techniques are based on sampling from probabilistic models, and by default, they come with no guarantees about correctness, safety, fairness, or other properties. Statistical methods offer a promising potential approach to improve the reliability of generative AI techniques. In addition, statistical methods are also promising for improving the quality and efficiency of AI evaluation, as well as for designing interventions and experiments in AI. In this paper, we review some of the existing work on these topics, explaining both the general statistical techniques used, as well as their applications to generative AI. We also discuss limitations and potential future directions.

* Invited review paper for Annual Review of Statistics and Its Application. Feedback welcome

Via

Access Paper or Ask Questions

Foundations of Top-$k$ Decoding For Language Models

May 25, 2025

Georgy Noarov, Soham Mallick, Tao Wang, Sunay Joshi, Yan Sun, Yangxinyu Xie, Mengxin Yu, Edgar Dobriban

Abstract:Top-$k$ decoding is a widely used method for sampling from LLMs: at each token, only the largest $k$ next-token-probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top-$k$ and other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top-$k$ decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top-$k$ decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We consider \emph{Bregman decoders} obtained by minimizing a separable Bregman divergence (for both the \emph{primal} and \emph{dual} cases) with a sparsity-inducing $\ell_0$ regularization. Despite the combinatorial nature of the objective, we show how to optimize it efficiently for a large class of divergences. We show that the optimal decoding strategies are greedy, and further that the loss function is discretely convex in $k$, so that binary search provably and efficiently finds the optimal $k$. We show that top-$k$ decoding arises as a special case for the KL divergence, and identify new decoding strategies that have distinct behaviors (e.g., non-linearly up-weighting larger probabilities after re-normalization).

Via

Access Paper or Ask Questions

Synthetic-Powered Predictive Inference

May 19, 2025

Meshi Bashari, Roy Maor Lotan, Yonghoon Lee, Edgar Dobriban, Yaniv Romano

Abstract:Conformal prediction is a framework for predictive inference with a distribution-free, finite-sample guarantee. However, it tends to provide uninformative prediction sets when calibration data are scarce. This paper introduces Synthetic-powered predictive inference (SPPI), a novel framework that incorporates synthetic data -- e.g., from a generative model -- to improve sample efficiency. At the core of our method is a score transporter: an empirical quantile mapping that aligns nonconformity scores from trusted, real data with those from synthetic data. By carefully integrating the score transporter into the calibration process, SPPI provably achieves finite-sample coverage guarantees without making any assumptions about the real and synthetic data distributions. When the score distributions are well aligned, SPPI yields substantially tighter and more informative prediction sets than standard conformal prediction. Experiments on image classification and tabular regression demonstrate notable improvements in predictive efficiency in data-scarce settings.

Via

Access Paper or Ask Questions

Likelihood-Ratio Regularized Quantile Regression: Adapting Conformal Prediction to High-Dimensional Covariate Shifts

Feb 18, 2025

Sunay Joshi, Shayan Kiyani, George Pappas, Edgar Dobriban, Hamed Hassani

Abstract:We consider the problem of conformal prediction under covariate shift. Given labeled data from a source domain and unlabeled data from a covariate shifted target domain, we seek to construct prediction sets with valid marginal coverage in the target domain. Most existing methods require estimating the unknown likelihood ratio function, which can be prohibitive for high-dimensional data such as images. To address this challenge, we introduce the likelihood ratio regularized quantile regression (LR-QR) algorithm, which combines the pinball loss with a novel choice of regularization in order to construct a threshold function without directly estimating the unknown likelihood ratio. We show that the LR-QR method has coverage at the desired level in the target domain, up to a small error term that we can control. Our proofs draw on a novel analysis of coverage via stability bounds from learning theory. Our experiments demonstrate that the LR-QR algorithm outperforms existing methods on high-dimensional prediction tasks, including a regression task for the Communities and Crime dataset, and an image classification task from the WILDS repository.

Via

Access Paper or Ask Questions

A Confidence Interval for the $\ell_2$ Expected Calibration Error

Aug 16, 2024

Yan Sun, Pratik Chaudhari, Ian J. Barnett, Edgar Dobriban

$Figure 1 for A Confidence Interval for the $\ell_2$ Expected Calibration Error$

$Figure 2 for A Confidence Interval for the $\ell_2$ Expected Calibration Error$

$Figure 3 for A Confidence Interval for the $\ell_2$ Expected Calibration Error$

$Figure 4 for A Confidence Interval for the $\ell_2$ Expected Calibration Error$

Abstract:Recent advances in machine learning have significantly improved prediction accuracy in various applications. However, ensuring the calibration of probabilistic predictions remains a significant challenge. Despite efforts to enhance model calibration, the rigorous statistical evaluation of model calibration remains less explored. In this work, we develop confidence intervals the $\ell_2$ Expected Calibration Error (ECE). We consider top-1-to-$k$ calibration, which includes both the popular notion of confidence calibration as well as full calibration. For a debiased estimator of the ECE, we show asymptotic normality, but with different convergence rates and asymptotic variances for calibrated and miscalibrated models. We develop methods to construct asymptotically valid confidence intervals for the ECE, accounting for this behavior as well as non-negativity. Our theoretical findings are supported through extensive experiments, showing that our methods produce valid confidence intervals with shorter lengths compared to those obtained by resampling-based methods.

Via

Access Paper or Ask Questions

Watermarking Language Models with Error Correcting Codes

Jun 12, 2024

Patrick Chao, Edgar Dobriban, Hamed Hassani

Figure 1 for Watermarking Language Models with Error Correcting Codes

Figure 2 for Watermarking Language Models with Error Correcting Codes

Figure 3 for Watermarking Language Models with Error Correcting Codes

Figure 4 for Watermarking Language Models with Error Correcting Codes

Abstract:Recent progress in large language models enables the creation of realistic machine-generated content. Watermarking is a promising approach to distinguish machine-generated text from human text, embedding statistical signals in the output that are ideally undetectable to humans. We propose a watermarking framework that encodes such signals through an error correcting code. Our method, termed robust binary code (RBC) watermark, introduces no distortion compared to the original probability distribution, and no noticeable degradation in quality. We evaluate our watermark on base and instruction fine-tuned models and find our watermark is robust to edits, deletions, and translations. We provide an information-theoretic perspective on watermarking, a powerful statistical test for detection and for generating p-values, and theoretical guarantees. Our empirical findings suggest our watermark is fast, powerful, and robust, comparing favorably to the state-of-the-art.

Via

Access Paper or Ask Questions

One-Shot Safety Alignment for Large Language Models via Optimal Dualization

May 29, 2024

Xinmeng Huang, Shuo Li, Edgar Dobriban, Osbert Bastani, Hamed Hassani, Dongsheng Ding

Abstract:The growing safety concerns surrounding Large Language Models (LLMs) raise an urgent need to align them with diverse human preferences to simultaneously enhance their helpfulness and safety. A promising approach is to enforce safety constraints through Reinforcement Learning from Human Feedback (RLHF). For such constrained RLHF, common Lagrangian-based primal-dual policy optimization methods are computationally expensive and often unstable. This paper presents a dualization perspective that reduces constrained alignment to an equivalent unconstrained alignment problem. We do so by pre-optimizing a smooth and convex dual function that has a closed form. This shortcut eliminates the need for cumbersome primal-dual policy iterations, thus greatly reducing the computational burden and improving training stability. Our strategy leads to two practical algorithms in model-based and preference-based scenarios (MoCAN and PeCAN, respectively). A broad range of experiments demonstrate the effectiveness of our methods.

Via

Access Paper or Ask Questions

Uncertainty in Language Models: Assessment through Rank-Calibration

Apr 04, 2024

Xinmeng Huang, Shuo Li, Mengxin Yu, Matteo Sesia, Hamed Hassani, Insup Lee, Osbert Bastani, Edgar Dobriban

Figure 1 for Uncertainty in Language Models: Assessment through Rank-Calibration

Figure 2 for Uncertainty in Language Models: Assessment through Rank-Calibration

Figure 3 for Uncertainty in Language Models: Assessment through Rank-Calibration

Figure 4 for Uncertainty in Language Models: Assessment through Rank-Calibration

Abstract:Language Models (LMs) have shown promising performance in natural language generation. However, as LMs often generate incorrect or hallucinated responses, it is crucial to correctly quantify their uncertainty in responding to given inputs. In addition to verbalized confidence elicited via prompting, many uncertainty measures ($e.g.$, semantic entropy and affinity-graph-based measures) have been proposed. However, these measures can differ greatly, and it is unclear how to compare them, partly because they take values over different ranges ($e.g.$, $[0,\infty)$ or $[0,1]$). In this work, we address this issue by developing a novel and practical framework, termed $Rank$-$Calibration$, to assess uncertainty and confidence measures for LMs. Our key tenet is that higher uncertainty (or lower confidence) should imply lower generation quality, on average. Rank-calibration quantifies deviations from this ideal relationship in a principled manner, without requiring ad hoc binary thresholding of the correctness score ($e.g.$, ROUGE or METEOR). The broad applicability and the granular interpretability of our methods are demonstrated empirically.

Via

Access Paper or Ask Questions

Inference in Randomized Least Squares and PCA via Normality of Quadratic Forms

Apr 01, 2024

Leda Wang, Zhixiang Zhang, Edgar Dobriban

Figure 1 for Inference in Randomized Least Squares and PCA via Normality of Quadratic Forms

Figure 2 for Inference in Randomized Least Squares and PCA via Normality of Quadratic Forms

Figure 3 for Inference in Randomized Least Squares and PCA via Normality of Quadratic Forms

Figure 4 for Inference in Randomized Least Squares and PCA via Normality of Quadratic Forms

Abstract:Randomized algorithms can be used to speed up the analysis of large datasets. In this paper, we develop a unified methodology for statistical inference via randomized sketching or projections in two of the most fundamental problems in multivariate statistical analysis: least squares and PCA. The methodology applies to fixed datasets -- i.e., is data-conditional -- and the only randomness is due to the randomized algorithm. We propose statistical inference methods for a broad range of sketching distributions, such as the subsampled randomized Hadamard transform (SRHT), Sparse Sign Embeddings (SSE) and CountSketch, sketching matrices with i.i.d. entries, and uniform subsampling. To our knowledge, no comparable methods are available for SSE and for SRHT in PCA. Our novel theoretical approach rests on showing the asymptotic normality of certain quadratic forms. As a contribution of broader interest, we show central limit theorems for quadratic forms of the SRHT, relying on a novel proof via a dyadic expansion that leverages the recursive structure of the Hadamard transform. Numerical experiments using both synthetic and empirical datasets support the efficacy of our methods, and in particular suggest that sketching methods can have better computation-estimation tradeoffs than recently proposed optimal subsampling methods.

Via

Access Paper or Ask Questions

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Mar 28, 2024

Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer(+2 more)

Figure 1 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Figure 2 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Figure 3 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Figure 4 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Abstract:Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. Evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address. First, there is no clear standard of practice regarding jailbreaking evaluation. Second, existing works compute costs and success rates in incomparable ways. And third, numerous works are not reproducible, as they withhold adversarial prompts, involve closed-source code, or rely on evolving proprietary APIs. To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) a new jailbreaking dataset containing 100 unique behaviors, which we call JBB-Behaviors; (2) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (3) a standardized evaluation framework that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard that tracks the performance of attacks and defenses for various LLMs. We have carefully considered the potential ethical implications of releasing this benchmark, and believe that it will be a net positive for the community. Over time, we will expand and adapt the benchmark to reflect technical and methodological advances in the research community.

Via

Access Paper or Ask Questions