Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhenggang Wang

Fair Bayesian Data Selection via Generalized Discrepancy Measures

Nov 10, 2025

Yixuan Zhang, Jiabin Luo, Zhenggang Wang, Feng Zhou, Quyu Kong

Figure 1 for Fair Bayesian Data Selection via Generalized Discrepancy Measures

Figure 2 for Fair Bayesian Data Selection via Generalized Discrepancy Measures

Figure 3 for Fair Bayesian Data Selection via Generalized Discrepancy Measures

Figure 4 for Fair Bayesian Data Selection via Generalized Discrepancy Measures

Abstract:Fairness concerns are increasingly critical as machine learning models are deployed in high-stakes applications. While existing fairness-aware methods typically intervene at the model level, they often suffer from high computational costs, limited scalability, and poor generalization. To address these challenges, we propose a Bayesian data selection framework that ensures fairness by aligning group-specific posterior distributions of model parameters and sample weights with a shared central distribution. Our framework supports flexible alignment via various distributional discrepancy measures, including Wasserstein distance, maximum mean discrepancy, and $f$-divergence, allowing geometry-aware control without imposing explicit fairness constraints. This data-centric approach mitigates group-specific biases in training data and improves fairness in downstream tasks, with theoretical guarantees. Experiments on benchmark datasets show that our method consistently outperforms existing data selection and model-based fairness methods in both fairness and accuracy.

Via

Access Paper or Ask Questions

Counting Cycles with Deepseek

May 23, 2025

Jiashun Jin, Tracy Ke, Bingcheng Sui, Zhenggang Wang

Abstract:Despite recent progress, AI still struggles on advanced mathematics. We consider a difficult open problem: How to derive a Computationally Efficient Equivalent Form (CEEF) for the cycle count statistic? The CEEF problem does not have known general solutions, and requires delicate combinatorics and tedious calculations. Such a task is hard to accomplish by humans but is an ideal example where AI can be very helpful. We solve the problem by combining a novel approach we propose and the powerful coding skills of AI. Our results use delicate graph theory and contain new formulas for general cases that have not been discovered before. We find that, while AI is unable to solve the problem all by itself, it is able to solve it if we provide it with a clear strategy, a step-by-step guidance and carefully written prompts. For simplicity, we focus our study on DeepSeek-R1 but we also investigate other AI approaches.

Via

Access Paper or Ask Questions