Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Steven An

University of California, San Diego

Reliable Programmatic Weak Supervision with Confidence Intervals for Label Probabilities

Aug 05, 2025

Verónica Álvarez, Santiago Mazuelas, Steven An, Sanjoy Dasgupta

Abstract:The accurate labeling of datasets is often both costly and time-consuming. Given an unlabeled dataset, programmatic weak supervision obtains probabilistic predictions for the labels by leveraging multiple weak labeling functions (LFs) that provide rough guesses for labels. Weak LFs commonly provide guesses with assorted types and unknown interdependences that can result in unreliable predictions. Furthermore, existing techniques for programmatic weak supervision cannot provide assessments for the reliability of the probabilistic predictions for labels. This paper presents a methodology for programmatic weak supervision that can provide confidence intervals for label probabilities and obtain more reliable predictions. In particular, the methods proposed use uncertainty sets of distributions that encapsulate the information provided by LFs with unrestricted behavior and typology. Experiments on multiple benchmark datasets show the improvement of the presented methods over the state-of-the-art and the practicality of the confidence intervals presented.

Via

Access Paper or Ask Questions

Convergence Behavior of an Adversarial Weak Supervision Method

May 25, 2024

Steven An, Sanjoy Dasgupta

Figure 1 for Convergence Behavior of an Adversarial Weak Supervision Method

Figure 2 for Convergence Behavior of an Adversarial Weak Supervision Method

Figure 3 for Convergence Behavior of an Adversarial Weak Supervision Method

Figure 4 for Convergence Behavior of an Adversarial Weak Supervision Method

Abstract:Labeling data via rules-of-thumb and minimal label supervision is central to Weak Supervision, a paradigm subsuming subareas of machine learning such as crowdsourced learning and semi-supervised ensemble learning. By using this labeled data to train modern machine learning methods, the cost of acquiring large amounts of hand labeled data can be ameliorated. Approaches to combining the rules-of-thumb falls into two camps, reflecting different ideologies of statistical estimation. The most common approach, exemplified by the Dawid-Skene model, is based on probabilistic modeling. The other, developed in the work of Balsubramani-Freund and others, is adversarial and game-theoretic. We provide a variety of statistical results for the adversarial approach under log-loss: we characterize the form of the solution, relate it to logistic regression, demonstrate consistency, and give rates of convergence. On the other hand, we find that probabilistic approaches for the same model class can fail to be consistent. Experimental results are provided to corroborate the theoretical results.

* 49 pages, 16 figures, to be published in UAI 2024

Via

Access Paper or Ask Questions