Exploiting Hardware Utilization and Adaptive Dataflow for Efficient Sparse Convolution in 3D Point Clouds

Part of Proceedings of Machine Learning and Systems 5 pre-proceedings (MLSys 2023) mlsys2023


Bibtek download is not available in the pre-proceeding


Ke Hong, Zhongming Yu, Guohao Dai, Xinhao Yang, Yaoxiu Lian, 泽浩 刘, Ningyi Xu, Yu Wang


Sparse convolution is the key operator in widely-used 3D point cloud networks. However, due to the high sparsity of voxelized input point cloud data, three main challenges need to be solved for efficient sparse convolution in current 3D point cloud engines: (1) Memory under-utilization: the mapping information from input data to weight parameters of 3D point cloud networks is sparse, leading to up to 79.97% redundant memory access and under-utilized memory space; (2) Computation under-utilization: previous FGMS (Fused Gather-Matrix-Multiplication-Scatter) operations in sparse convolution are executed sequentially, leading to a GPU computation utilization of only 22.84%; (3) Input dynamics: a single and static dataflow in the current point cloud engines cannot always achieve the best performance on different input point cloud data.To tackle these challenges, we propose PCEngine, an efficient sparse convolution engine for voxel-based 3D point cloud networks. PCEngine proposes a novel coded-CSR (Compress Sparse Row) format to represent the mapping information without redundancy. PCEngine also introduces the indicator-assisted segmented FGMS fusion scheme to fully utilize the computation resources on GPU hardware. PCEngine further deploys a heuristic adaptive dataflow for input dynamics. Extensive experimental results show that, PCEngine achieves 1.81× and 1.64× speedup on average for sparse convolution operation and end-to-end point cloud networks, respectively.