



Active Learning (AL) aims to reduce the labeling burden by interactively querying the most informative observations from a data pool. Despite extensive research on improving AL query methods in the past years, recent studies have questioned the advantages of AL, especially in the light of emerging alternative training paradigms such as semi-supervised (Semi-SL) and self-supervised learning (Self-SL). Thus, today's AL literature paints an inconsistent picture and leaves practitioners wondering whether and how to employ AL in their tasks. We argue that this heterogeneous landscape is caused by a lack of a systematic and realistic evaluation of AL algorithms, including key parameters such as complex and imbalanced datasets, realistic labeling scenarios, systematic method configuration, and integration of Semi-SL and Self-SL. To this end, we present an AL benchmarking suite and run extensive experiments on five datasets shedding light on the questions: when and how to apply AL?