Value Learning for Throughput Optimization of Deep Learning Workloads

Part of Proceedings of Machine Learning and Systems 3 (MLSys 2021)

Bibtex Paper

Authors

Benoit Steiner, Chris Cummins, Horace He, Hugh Leather

Abstract

As the usage of machine learning techniques is becoming ubiquitous, the efficient execution of deep learning models is crucial to many applications. Frameworks such as Halide or TVM separate the algorithmic representation of the neural network from the schedule that determines its implementation. Finding good schedules, however, remains extremely challenging. Autotuning methods, which search the space of valid schedules and execute each candidate on the hardware, identify some of the best performing schedules, but the search can take hours, hampering the productivity of deep learning practitioners. What is needed is a method that achieves a similar performance without extensive search, delivering the needed efficiency quickly.

We model the scheduling process as a sequence of optimization choices, and present a new technique to accurately predict the expected performance of a partial schedule using a LSTM over carefully engineered features that describe each DNN operator and their current scheduling choices. Leveraging these predictions we are able to make these optimization decisions greedily, and without any executions on the target hardware, quickly identify an efficient schedule.

Our evaluation shows that our performance predictions are one order of magnitude more accurate than the state of the art. This enables us to find schedules that improve the execution performance of deep neural networks by 1.5x or more over the best autoschedulers. Moreover, our technique is two to three orders of magnitude faster than these tools, and completes in seconds instead of hours.