Abstract:Managing the configurations of a database system poses significant challenges due to the multitude of configuration knobs that impact various system aspects.The lack of standardization, independence, and universality among these knobs further complicates the task of determining the optimal settings.To address this issue, an automated solution leveraging supervised and unsupervised machine learning techniques was developed.This solution aims to identify influential knobs, analyze previously unseen workloads, and provide recommendations for knob settings.The effectiveness of this approach is demonstrated through the evaluation of a new tool called OtterTune [1] on three different database management systems (DBMSs).The results indicate that OtterTune's recommendations are comparable to or even surpass the configurations generated by existing tools or human experts.In this study, we build upon the automated technique introduced in the original OtterTune paper, utilizing previously collected training data to optimize new DBMS deployments.By employing supervised and unsupervised machine learning methods, we focus on improving latency prediction.Our approach expands upon the methods proposed in the original paper by incorporating GMM clustering to streamline metrics selection and combining ensemble models (such as RandomForest) with non-linear models (like neural networks) for more accurate prediction modeling.
Abstract:The management of database system configurations is a challenging task, as there are hundreds of configuration knobs that control every aspect of the system. This is complicated by the fact that these knobs are not standardized, independent, or universal, making it difficult to determine optimal settings. An automated approach to address this problem using supervised and unsupervised machine learning methods to select impactful knobs, map unseen workloads, and recommend knob settings was implemented in a new tool called OtterTune and is being evaluated on three DBMSs, with results demonstrating that it recommends configurations as good as or better than those generated by existing tools or a human expert.In this work, we extend an automated technique based on Ottertune [1] to reuse training data gathered from previous sessions to tune new DBMS deployments with the help of supervised and unsupervised machine learning methods to improve latency prediction. Our approach involves the expansion of the methods proposed in the original paper. We use GMM clustering to prune metrics and combine ensemble models, such as RandomForest, with non-linear models, like neural networks, for prediction modeling.