Abstract:Prediction sets provide a theoretically grounded framework for quantifying uncertainty in machine learning models. Adapting them to structured generation tasks, in particular, large language model (LLM) based code generation, remains a challenging problem. An existing attempt proposes PAC prediction sets but is limited by its strong monotonicity assumption on risk and single-label classification framework, which severely limits the space of candidate programs and cannot accommodate the multiple valid outputs inherent to code generation. To address these limitations, we propose an approach RisCoSet that leverages multiple hypothesis testing to construct risk-controlling predictions for LLM-based code generation. Given a trained code generation model, we produce a prediction set represented by a partial program, which is guaranteed to contain a correct solution with high confidence. Extensive experiments on three LLMs demonstrate the effectiveness of the proposed method. For instance, compared with the state-of-the-art, our method can significantly reduce the code removal by up to 24.5%, at the same level of risk.
Abstract:Conformal prediction methods provide statistically rigorous marginal coverage guarantees for machine learning models, but such guarantees fail to account for algorithmic biases, thereby undermining fairness and trust. This paper introduces a fair conformal inference framework for classification tasks. The proposed method constructs prediction sets that guarantee conditional coverage on adaptively identified subgroups, which can be implicitly defined through nonlinear feature combinations. By balancing effectiveness and efficiency in producing compact, informative prediction sets and ensuring adaptive equalized coverage across unfairly treated subgroups, our approach paves a practical pathway toward trustworthy machine learning. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the framework.




Abstract:Graph neural networks (GNNs) have been widely used in many real applications, and recent studies have revealed their vulnerabilities against topology attacks. To address this issue, existing efforts have mainly been dedicated to improving the robustness of GNNs, while little attention has been paid to the detection of such attacks. In this work, we study the victim node detection problem under topology attacks against GNNs. Our approach is built upon the key observation rooted in the intrinsic message passing nature of GNNs. That is, the neighborhood of a victim node tends to have two competing group forces, pushing the node classification results towards the original label and the targeted label, respectively. Based on this observation, we propose to detect victim nodes by deliberately designing an effective measurement of the neighborhood variance for each node. Extensive experimental results on four real-world datasets and five existing topology attacks show the effectiveness and efficiency of the proposed detection approach.