Alert button
Picture for Thomas Parnell

Thomas Parnell

Alert button

Search-based Methods for Multi-Cloud Configuration

Apr 20, 2022
Małgorzata Łazuka, Thomas Parnell, Andreea Anghel, Haralampos Pozidis

Figure 1 for Search-based Methods for Multi-Cloud Configuration
Figure 2 for Search-based Methods for Multi-Cloud Configuration
Figure 3 for Search-based Methods for Multi-Cloud Configuration
Figure 4 for Search-based Methods for Multi-Cloud Configuration

Multi-cloud computing has become increasingly popular with enterprises looking to avoid vendor lock-in. While most cloud providers offer similar functionality, they may differ significantly in terms of performance and/or cost. A customer looking to benefit from such differences will naturally want to solve the multi-cloud configuration problem: given a workload, which cloud provider should be chosen and how should its nodes be configured in order to minimize runtime or cost? In this work, we consider solutions to this optimization problem. We develop and evaluate possible adaptations of state-of-the-art cloud configuration solutions to the multi-cloud domain. Furthermore, we identify an analogy between multi-cloud configuration and the selection-configuration problems commonly studied in the automated machine learning (AutoML) field. Inspired by this connection, we utilize popular optimizers from AutoML to solve multi-cloud configuration. Finally, we propose a new algorithm for solving multi-cloud configuration, CloudBandit (CB). It treats the outer problem of cloud provider selection as a best-arm identification problem, in which each arm pull corresponds to running an arbitrary black-box optimizer on the inner problem of node configuration. Our experiments indicate that (a) many state-of-the-art cloud configuration solutions can be adapted to multi-cloud, with best results obtained for adaptations which utilize the hierarchical structure of the multi-cloud configuration domain, (b) hierarchical methods from AutoML can be used for the multi-cloud configuration task and can outperform state-of-the-art cloud configuration solutions and (c) CB achieves competitive or lower regret relative to other tested algorithms, whilst also identifying configurations that have 65% lower median cost and 20% lower median time in production, compared to choosing a random provider and configuration.

* Submitted to IEEE Cloud 2022 
Viaarxiv icon

Towards a General Framework for ML-based Self-tuning Databases

Nov 16, 2020
Thomas Schmied, Diego Didona, Andreas Döring, Thomas Parnell, Nikolas Ioannou

Figure 1 for Towards a General Framework for ML-based Self-tuning Databases
Figure 2 for Towards a General Framework for ML-based Self-tuning Databases

Machine learning (ML) methods have recently emerged as an effective way to perform automated parameter tuning of databases. State-of-the-art approaches include Bayesian optimization (BO) and reinforcement learning (RL). In this work, we describe our experience when applying these methods to a database not yet studied in this context: FoundationDB. Firstly, we describe the challenges we faced, such as unknown valid ranges of configuration parameters and combinations of parameter values that result in invalid runs, and how we mitigated them. While these issues are typically overlooked, we argue that they are a crucial barrier to the adoption of ML self-tuning techniques in databases, and thus deserve more attention from the research community. Secondly, we present experimental results obtained when tuning FoundationDB using ML methods. Unlike prior work in this domain, we also compare with the simplest of baselines: random search. Our results show that, while BO and RL methods can improve the throughput of FoundationDB by up to 38%, random search is a highly competitive baseline, finding a configuration that is only 4% worse than the, vastly more complex, ML methods. We conclude that future work in this area may want to focus more on randomized, model-free optimization algorithms.

Viaarxiv icon

Differentially Private Stochastic Coordinate Descent

Jul 10, 2020
Georgios Damaskinos, Celestine Mendler-Dünner, Rachid Guerraoui, Nikolaos Papandreou, Thomas Parnell

Figure 1 for Differentially Private Stochastic Coordinate Descent
Figure 2 for Differentially Private Stochastic Coordinate Descent
Figure 3 for Differentially Private Stochastic Coordinate Descent
Figure 4 for Differentially Private Stochastic Coordinate Descent

In this paper we tackle the challenge of making the stochastic coordinate descent algorithm differentially private. Compared to the classical gradient descent algorithm where updates operate on a single model vector and controlled noise addition to this vector suffices to hide critical information about individuals, stochastic coordinate descent crucially relies on keeping auxiliary information in memory during training. This auxiliary information provides an additional privacy leak and poses the major challenge addressed in this work. Driven by the insight that under independent noise addition, the consistency of the auxiliary information holds in expectation, we present DP-SCD, the first differentially private stochastic coordinate descent algorithm. We analyze our new method theoretically and argue that decoupling and parallelizing coordinate updates is essential for its utility. On the empirical side we demonstrate competitive performance against the popular stochastic gradient descent alternative (DP-SGD) while requiring significantly less tuning.

Viaarxiv icon

MixBoost: A Heterogeneous Boosting Machine

Jun 17, 2020
Thomas Parnell, Andreea Anghel, Malgorzata Lazuka, Nikolas Ioannou, Sebastian Kurella, Peshal Agarwal, Nikolaos Papandreou, Haralampos Pozidis

Figure 1 for MixBoost: A Heterogeneous Boosting Machine
Figure 2 for MixBoost: A Heterogeneous Boosting Machine
Figure 3 for MixBoost: A Heterogeneous Boosting Machine
Figure 4 for MixBoost: A Heterogeneous Boosting Machine

Modern gradient boosting software frameworks, such as XGBoost and LightGBM, implement Newton descent in a functional space. At each boosting iteration, their goal is to find the base hypothesis, selected from some base hypothesis class, that is closest to the Newton descent direction in a Euclidean sense. Typically, the base hypothesis class is fixed to be all binary decision trees up to a given depth. In this work, we study a Heterogeneous Newton Boosting Machine (HNBM) in which the base hypothesis class may vary across boosting iterations. Specifically, at each boosting iteration, the base hypothesis class is chosen, from a fixed set of subclasses, by sampling from a probability distribution. We derive a global linear convergence rate for the HNBM under certain assumptions, and show that it agrees with existing rates for Newton's method when the Newton direction can be perfectly fitted by the base hypothesis at each boosting iteration. We then describe a particular realization of a HNBM, MixBoost, that, at each boosting iteration, randomly selects between either a decision tree of variable depth or a linear regressor with random Fourier features. We describe how MixBoost is implemented, with a focus on the training complexity. Finally, we present experimental results, using OpenML and Kaggle datasets, that show that MixBoost is able to achieve better generalization loss than competing boosting frameworks, without taking significantly longer to tune.

Viaarxiv icon

SySCD: A System-Aware Parallel Coordinate Descent Algorithm

Nov 18, 2019
Nikolas Ioannou, Celestine Mendler-Dünner, Thomas Parnell

Figure 1 for SySCD: A System-Aware Parallel Coordinate Descent Algorithm
Figure 2 for SySCD: A System-Aware Parallel Coordinate Descent Algorithm
Figure 3 for SySCD: A System-Aware Parallel Coordinate Descent Algorithm
Figure 4 for SySCD: A System-Aware Parallel Coordinate Descent Algorithm

In this paper we propose a novel parallel stochastic coordinate descent (SCD) algorithm with convergence guarantees that exhibits strong scalability. We start by studying a state-of-the-art parallel implementation of SCD and identify scalability as well as system-level performance bottlenecks of the respective implementation. We then take a principled approach to develop a new SCD variant which is designed to avoid the identified system bottlenecks, such as limited scaling due to coherence traffic of model sharing across threads, and inefficient CPU cache accesses. Our proposed system-aware parallel coordinate descent algorithm (SySCD) scales to many cores and across numa nodes, and offers a consistent bottom line speedup in training time of up to x12 compared to an optimized asynchronous parallel SCD algorithm and up to x42, compared to state-of-the-art GLM solvers (scikit-learn, Vowpal Wabbit, and H2O) on a range of datasets and multi-core CPU architectures.

* accepted as a spotlight at NeurIPS 2019, Vancouver, Canada 
Viaarxiv icon

Breadth-first, Depth-next Training of Random Forests

Oct 15, 2019
Andreea Anghel, Nikolas Ioannou, Thomas Parnell, Nikolaos Papandreou, Celestine Mendler-Dünner, Haris Pozidis

Figure 1 for Breadth-first, Depth-next Training of Random Forests
Figure 2 for Breadth-first, Depth-next Training of Random Forests
Figure 3 for Breadth-first, Depth-next Training of Random Forests
Figure 4 for Breadth-first, Depth-next Training of Random Forests

In this paper we analyze, evaluate, and improve the performance of training Random Forest (RF) models on modern CPU architectures. An exact, state-of-the-art binary decision tree building algorithm is used as the basis of this study. Firstly, we investigate the trade-offs between using different tree building algorithms, namely breadth-first-search (BFS) and depth-search-first (DFS). We design a novel, dynamic, hybrid BFS-DFS algorithm and demonstrate that it performs better than both BFS and DFS, and is more robust in the presence of workloads with different characteristics. Secondly, we identify CPU performance bottlenecks when generating trees using this approach, and propose optimizations to alleviate them. The proposed hybrid tree building algorithm for RF is implemented in the Snap Machine Learning framework, and speeds up the training of RFs by 7.8x on average when compared to state-of-the-art RF solvers (sklearn, H2O, and xgboost) on a range of datasets, RF configurations, and multi-core CPU architectures.

Viaarxiv icon

Learning to Tune XGBoost with XGBoost

Sep 19, 2019
Johanna Sommer, Dimitrios Sarigiannis, Thomas Parnell

Figure 1 for Learning to Tune XGBoost with XGBoost
Figure 2 for Learning to Tune XGBoost with XGBoost
Figure 3 for Learning to Tune XGBoost with XGBoost
Figure 4 for Learning to Tune XGBoost with XGBoost

In this short paper we investigate whether meta-learning techniques can be used to more effectively tune the hyperparameters of machine learning models using successive halving (SH). We propose a novel variant of the SH algorithm (MeSH), that uses meta-regressors to determine which candidate configurations should be eliminated at each round. We apply MeSH to the problem of tuning the hyperparameters of a gradient-boosted decision tree model. By training and tuning our meta-regressors using existing tuning jobs from 95 datasets, we demonstrate that MeSH can often find a superior solution to both SH and random search.

* 5 pages (references included) 
Viaarxiv icon

Weighted Sampling for Combined Model Selection and Hyperparameter Tuning

Sep 17, 2019
Dimitrios Sarigiannis, Thomas Parnell, Haris Pozidis

Figure 1 for Weighted Sampling for Combined Model Selection and Hyperparameter Tuning
Figure 2 for Weighted Sampling for Combined Model Selection and Hyperparameter Tuning
Figure 3 for Weighted Sampling for Combined Model Selection and Hyperparameter Tuning
Figure 4 for Weighted Sampling for Combined Model Selection and Hyperparameter Tuning

The combined algorithm selection and hyperparameter tuning (CASH) problem is characterized by large hierarchical hyperparameter spaces. Model-free hyperparameter tuning methods can explore such large spaces efficiently since they are highly parallelizable across multiple machines. When no prior knowledge or meta-data exists to boost their performance, these methods commonly sample random configurations following a uniform distribution. In this work, we propose a novel sampling distribution as an alternative to uniform sampling and prove theoretically that it has a better chance of finding the best configuration in a worst-case setting. In order to compare competing methods rigorously in an experimental setting, one must perform statistical hypothesis testing. We show that there is little-to-no agreement in the automated machine learning literature regarding which methods should be used. We contrast this disparity with the methods recommended by the broader statistics literature, and identify the most suitable approach. We then select three popular model-free solutions to CASH and evaluate their performance, with uniform sampling as well as the proposed sampling scheme, across 67 datasets from the OpenML platform. We investigate the trade-off between exploration and exploitation across the three algorithms, and verify empirically that the proposed sampling distribution improves performance in all cases.

* pages: 14 
Viaarxiv icon