Shitz
Abstract:Selecting artificial intelligence (AI) models, such as large language models (LLMs), from multiple candidates requires accurate performance estimation. This is ideally achieved through empirical evaluations involving abundant real-world data. However, such evaluations are costly and impractical at scale. To address this challenge, autoevaluation methods leverage synthetic data produced by automated evaluators, such as LLMs-as-judges, reducing variance but potentially introducing bias. Recent approaches have employed semi-supervised prediction-powered inference (\texttt{PPI}) to correct for the bias of autoevaluators. However, the use of autoevaluators may lead in practice to a degradation in sample efficiency compared to conventional methods using only real-world data. In this paper, we propose \texttt{R-AutoEval+}, a novel framework that provides finite-sample reliability guarantees on the model evaluation, while also ensuring an enhanced (or at least no worse) sample efficiency compared to conventional methods. The key innovation of \texttt{R-AutoEval+} is an adaptive construction of the model evaluation variable, which dynamically tunes its reliance on synthetic data, reverting to conventional methods when the autoevaluator is insufficiently accurate. Experiments on the use of LLMs-as-judges for the optimization of quantization settings for the weights of an LLM, and for prompt design in LLMs confirm the reliability and efficiency of \texttt{R-AutoEval+}.
Abstract:Low-Rank Adaptation (LoRA), which introduces a product of two trainable low-rank matrices into frozen pre-trained weights, is widely used for efficient fine-tuning of language models in federated learning (FL). However, when combined with differentially private stochastic gradient descent (DP-SGD), LoRA faces substantial noise amplification: DP-SGD perturbs per-sample gradients, and the matrix multiplication of the LoRA update ($BA$) intensifies this effect. Freezing one matrix (e.g., $A$) reduces the noise but restricts model expressiveness, often resulting in suboptimal adaptation. To address this, we propose FedSVD, a simple yet effective method that introduces a global reparameterization based on singular value decomposition (SVD). In our approach, each client optimizes only the $B$ matrix and transmits it to the server. The server aggregates the $B$ matrices, computes the product $BA$ using the previous $A$, and refactorizes the result via SVD. This yields a new adaptive $A$ composed of the orthonormal right singular vectors of $BA$, and an updated $B$ containing the remaining SVD components. This reparameterization avoids quadratic noise amplification, while allowing $A$ to better capture the principal directions of the aggregate updates. Moreover, the orthonormal structure of $A$ bounds the gradient norms of $B$ and preserves more signal under DP-SGD, as confirmed by our theoretical analysis. As a result, FedSVD consistently improves stability and performance across a variety of privacy settings and benchmarks, outperforming relevant baselines under both private and non-private regimes.
Abstract:AI is poised to revolutionize telecommunication networks by boosting efficiency, automation, and decision-making. However, the black-box nature of most AI models introduces substantial risk, possibly deterring adoption by network operators. These risks are not addressed by the current prevailing deployment strategy, which typically follows a best-effort train-and-deploy paradigm. This paper reviews conformal calibration, a general framework that moves beyond the state of the art by adopting computationally lightweight, advanced statistical tools that offer formal reliability guarantees without requiring further training or fine-tuning. Conformal calibration encompasses pre-deployment calibration via uncertainty quantification or hyperparameter selection; online monitoring to detect and mitigate failures in real time; and counterfactual post-deployment performance analysis to address "what if" diagnostic questions after deployment. By weaving conformal calibration into the AI model lifecycle, network operators can establish confidence in black-box AI models as a dependable enabling technology for wireless systems.
Abstract:Consider an edge computing setting in which a user submits queries for the solution of a linear system to an edge processor, which is subject to time-varying computing availability. The edge processor applies a probabilistic linear solver (PLS) so as to be able to respond to the user's query within the allotted time and computing budget. Feedback to the user is in the form of an uncertainty set. Due to model misspecification, the uncertainty set obtained via a direct application of PLS does not come with coverage guarantees with respect to the true solution of the linear system. This work introduces a new method to calibrate the uncertainty sets produced by PLS with the aim of guaranteeing long-term coverage requirements. The proposed method, referred to as online conformal prediction-PLS (OCP-PLS), assumes sporadic feedback from cloud to edge. This enables the online calibration of uncertainty thresholds via online conformal prediction (OCP), an online optimization method previously studied in the context of prediction models. The validity of OCP-PLS is verified via experiments that bring insights into trade-offs between coverage, prediction set size, and cloud usage.
Abstract:Stochastic first-order methods for empirical risk minimization employ gradient approximations based on sampled data in lieu of exact gradients. Such constructions introduce noise into the learning dynamics, which can be corrected through variance-reduction techniques. There is increasing evidence in the literature that in many modern learning applications noise can have a beneficial effect on optimization and generalization. To this end, the recently proposed variance-reduction technique, alpha-SVRG [Yin et al., 2023] allows for fine-grained control of the level of residual noise in the learning dynamics, and has been reported to empirically outperform both SGD and SVRG in modern deep learning scenarios. By focusing on strongly convex environments, we first provide a unified convergence rate expression for alpha-SVRG under fixed learning rate, which reduces to that of either SGD or SVRG by setting alpha=0 or alpha=1, respectively. We show that alpha-SVRG has faster convergence rate compared to SGD and SVRG under suitable choice of alpha. Simulation results on linear regression validate our theory.
Abstract:Federated Learning (FL) is a widely used framework for training models in a decentralized manner, ensuring that the central server does not have direct access to data from local clients. However, this approach may still fail to fully preserve data privacy, as models from local clients are exposed to the central server during the aggregation process. This issue becomes even more critical when training vision-language models (VLMs) with FL, as VLMs can easily memorize training data instances, making them vulnerable to membership inference attacks (MIAs). To address this challenge, we propose the FedRand framework, which avoids disclosing the full set of client parameters. In this framework, each client randomly selects subparameters of Low-Rank Adaptation (LoRA) from the server and keeps the remaining counterparts of the LoRA weights as private parameters. After training both parameters on the client's private dataset, only the non-private client parameters are sent back to the server for aggregation. This approach mitigates the risk of exposing client-side VLM parameters, thereby enhancing data privacy. We empirically validate that FedRand improves robustness against MIAs compared to relevant baselines while achieving accuracy comparable to methods that communicate full LoRA parameters across several benchmark datasets.
Abstract:Modern software-defined networks, such as Open Radio Access Network (O-RAN) systems, rely on artificial intelligence (AI)-powered applications running on controllers interfaced with the radio access network. To ensure that these AI applications operate reliably at runtime, they must be properly calibrated before deployment. A promising and theoretically grounded approach to calibration is conformal prediction (CP), which enhances any AI model by transforming it into a provably reliable set predictor that provides error bars for estimates and decisions. CP requires calibration data that matches the distribution of the environment encountered during runtime. However, in practical scenarios, network controllers often have access only to data collected under different contexts -- such as varying traffic patterns and network conditions -- leading to a mismatch between the calibration and runtime distributions. This paper introduces a novel methodology to address this calibration-test distribution shift. The approach leverages meta-learning to develop a zero-shot estimator of distribution shifts, relying solely on contextual information. The proposed method, called meta-learned context-dependent weighted conformal prediction (ML-WCP), enables effective calibration of AI applications without requiring data from the current context. Additionally, it can incorporate data from multiple contexts to further enhance calibration reliability.
Abstract:Deploying artificial intelligence (AI) models on edge devices involves a delicate balance between meeting stringent complexity constraints, such as limited memory and energy resources, and ensuring reliable performance in sensitive decision-making tasks. One way to enhance reliability is through uncertainty quantification via Bayesian inference. This approach, however, typically necessitates maintaining and running multiple models in an ensemble, which may exceed the computational limits of edge devices. This paper introduces a low-complexity methodology to address this challenge by distilling calibration information from a more complex model. In an offline phase, predictive probabilities generated by a high-complexity cloud-based model are leveraged to determine a threshold based on the typical divergence between the cloud and edge models. At run time, this threshold is used to construct credal sets -- ranges of predictive probabilities that are guaranteed, with a user-selected confidence level, to include the predictions of the cloud model. The credal sets are obtained through thresholding of a divergence measure in the simplex of predictive probabilities. Experiments on visual and language tasks demonstrate that the proposed approach, termed Conformalized Distillation for Credal Inference (CD-CI), significantly improves calibration performance compared to low-complexity Bayesian methods, such as Laplace approximation, making it a practical and efficient solution for edge AI deployments.
Abstract:The Large Vision Language Model (VLM) has recently addressed remarkable progress in bridging two fundamental modalities. VLM, trained by a sufficiently large dataset, exhibits a comprehensive understanding of both visual and linguistic to perform diverse tasks. To distill this knowledge accurately, in this paper, we introduce a novel approach that explicitly utilizes VLM as an objective function form for the Human-Object Interaction (HOI) detection task (\textbf{VLM-HOI}). Specifically, we propose a method that quantifies the similarity of the predicted HOI triplet using the Image-Text matching technique. We represent HOI triplets linguistically to fully utilize the language comprehension of VLMs, which are more suitable than CLIP models due to their localization and object-centric nature. This matching score is used as an objective for contrastive optimization. To our knowledge, this is the first utilization of VLM language abilities for HOI detection. Experiments demonstrate the effectiveness of our method, achieving state-of-the-art HOI detection accuracy on benchmarks. We believe integrating VLMs into HOI detection represents important progress towards more advanced and interpretable analysis of human-object interactions.
Abstract:Bayesian optimization (BO) is a sequential approach for optimizing black-box objective functions using zeroth-order noisy observations. In BO, Gaussian processes (GPs) are employed as probabilistic surrogate models to estimate the objective function based on past observations, guiding the selection of future queries to maximize utility. However, the performance of BO heavily relies on the quality of these probabilistic estimates, which can deteriorate significantly under model misspecification. To address this issue, we introduce localized online conformal prediction-based Bayesian optimization (LOCBO), a BO algorithm that calibrates the GP model through localized online conformal prediction (CP). LOCBO corrects the GP likelihood based on predictive sets produced by LOCBO, and the corrected GP likelihood is then denoised to obtain a calibrated posterior distribution on the objective function. The likelihood calibration step leverages an input-dependent calibration threshold to tailor coverage guarantees to different regions of the input space. Under minimal noise assumptions, we provide theoretical performance guarantees for LOCBO's iterates that hold for the unobserved objective function. These theoretical findings are validated through experiments on synthetic and real-world optimization tasks, demonstrating that LOCBO consistently outperforms state-of-the-art BO algorithms in the presence of model misspecification.