Fluid: Resource-aware Hyperparameter Tuning Engine

Part of Proceedings of Machine Learning and Systems 3 (MLSys 2021)

Bibtex Paper

Authors

Peifeng Yu, Jiachen Liu, Mosharaf Chowdhury

Abstract

Current hyperparameter tuning solutions lack complementary execution engines to efficiently leverage distributed computation, thus ignoring the possibility of intra- and inter-GPU sharing, which exhibits poor resource usage. In this paper, we present Fluid, a generalized hyperparameter tuning execution engine, that coordinates between hyperparameter tuning jobs and cluster resources. Fluid schedules evaluation trials in such jobs using a water-filling approach to make the best use of resources both at intra- and inter-GPU granularities to speed up the tuning process. By abstracting a hyperparameter tuning job as a sequence of TrialGroup, Fluid can boost the performance of diverse hyperparameter tuning solutions. Our experiments show that Fluid can speed up synchronous BOHB by 200%, and BOHB and ASHA by 30% while having similar final accuracy.