Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

Jul 04, 2022

Xueying Zhan, Zeyu Dai, Qingzhong Wang, Qing Li, Haoyi Xiong, Dejing Dou, Antoni B. Chan

Figure 1 for Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

Figure 2 for Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

Figure 3 for Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

Figure 4 for Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

Share this with someone who'll enjoy it:

Abstract:Pool-based Active Learning (AL) has achieved great success in minimizing labeling cost by sequentially selecting informative unlabeled samples from a large unlabeled data pool and querying their labels from oracle/annotators. However, existing AL sampling strategies might not work well in out-of-distribution (OOD) data scenarios, where the unlabeled data pool contains some data samples that do not belong to the classes of the target task. Achieving good AL performance under OOD data scenarios is a challenging task due to the natural conflict between AL sampling strategies and OOD sample detection. AL selects data that are hard to be classified by the current basic classifier (e.g., samples whose predicted class probabilities have high entropy), while OOD samples tend to have more uniform predicted class probabilities (i.e., high entropy) than in-distribution (ID) data. In this paper, we propose a sampling scheme, Monte-Carlo Pareto Optimization for Active Learning (POAL), which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool. We cast the AL sampling task as a multi-objective optimization problem, and thus we utilize Pareto optimization based on two conflicting objectives: (1) the normal AL data sampling scheme (e.g., maximum entropy), and (2) the confidence of not being an OOD sample. Experimental results show its effectiveness on both classical Machine Learning (ML) and Deep Learning (DL) tasks.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

Paper and Code