Part of Proceedings of Machine Learning and Systems 2 (MLSys 2020)
Joshua Fromm, Meghan Cowan, Matthai Philipose, Luis Ceze, Shwetak Patel
Binarized neural networks have attracted much recent attention due to their promise of making convolutional neural networks fast and compact. However, these benefits have proven hard to realize in practice. In this paper, we identify the underlying barriers to high performance and propose solutions from missing implementations for certain operations to carefully scheduled library support for binarized linear algebra operations. The combination of these innovations allows us to report the first measured end-to-end speedups for binarized networks. For instance, we show a 6.3_ speedup over a standard VGGNet variant at state-of-the-art (64.2% for top-1 binarized classification of ImageNet) accuracy. More broadly speedups range from 4-12_ and the techniques we propose are crucial to achieving them.