Fluid: Resource-aware Hyperparameter Tuning Engine

Part of Proceedings of Machine Learning and Systems 3 (MLSys 2021)

Bibtex Paper


Peifeng Yu, Jiachen Liu, Mosharaf Chowdhury


Current hyperparameter tuning solutions lack complementary execution engines to efficiently leverage distributed computation, thus ignoring the possibility of intra- and inter-GPU sharing, which exhibits poor resource usage. In this paper, we present Fluid, a generalized hyperparameter tuning execution engine, that coordinates between hyperparameter tuning jobs and cluster resources. Fluid schedules evaluation trials in such jobs using a water-filling approach to make the best use of resources both at intra- and inter-GPU granularities to speed up the tuning process. By abstracting a hyperparameter tuning job as a sequence of TrialGroup, Fluid can boost the performance of diverse hyperparameter tuning solutions. Our experiments show that Fluid can speed up synchronous BOHB by 200%, and BOHB and ASHA by 30% while having similar final accuracy.