Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation Sparsity

Part of Proceedings of Machine Learning and Systems 3 (MLSys 2021)

Bibtex Paper


Toshiaki Wakatsuki, Sekitoshi Kanai, Yasuhiro Fujiwara


This paper proposes a range-bound-aware convolution layer that accelerates the inference of rectified linear unit (ReLU)-based convolutional neural networks (CNNs) for analyzing video streams. Since video analysis systems require to process each video frame in real-time, the computational cost of inference of CNNs must be reduced. Several techniques heuristically skip the computation for the current frame and reuse the results of the previous frame when the current and previous frames are sufficiently similar. However, for critical applications such as surveillance systems, their accuracy can be unsatisfactory because they sacrifice accuracy for efficiency. In contrast, our method reduces the computational cost of convolution layers accompanied by ReLU while producing exactly the same inference results as an original model. We utilize both temporal similarity of video frames and activation sparsity in ReLU-based CNNs to guarantee to skip truly redundant computations. We experimentally confirm that our method can accelerate widely used pre-trained CNNs with both CPU and GPU implementations.