Abstract:Modern software systems and products increasingly rely on machine learning models to make data-driven decisions based on interactions with users and systems, e.g., compute infrastructure. For broader adoption, this practice must (i) accommodate software engineers without ML backgrounds, and (ii) provide mechanisms to optimize for product goals. In this work, we describe general principles and a specific end-to-end ML platform, Looper, which offers easy-to-use APIs for decision-making and feedback collection. Looper supports the full end-to-end ML lifecycle from online data collection to model training, deployment, inference, and extends support to evaluation and tuning against product goals. We outline the platform architecture and overall impact of production deployment -- Looper currently hosts 700 ML models and makes 6 million decisions per second. We also describe the learning curve and summarize experiences of platform adopters.
Abstract:We study online active learning for classifying streaming instances within the framework of statistical learning theory.. At each time, the decision maker decides whether to query for the label of the current instance and, in the event of no query, self labels the instance. The objective is to minimize the number of queries while constraining the number of classification errors over a horizon of length $T$. We consider a general concept space with a finite VC dimension $d$ and adopt the agnostic setting. We propose a disagreement-based online learning algorithm and establish its $O(d\log^2 T)$ label complexity and $\Theta(1)$ (i.e., bounded) classification errors in excess to the best classifier in the concept space under the Massart bounded noise condition.