Hyeontaek Lim, David G Andersen, Michael Kaminsky
3LC is a lossy compression scheme for state change traffic in distributed machine learning (ML) that strikes a balance between multiple goals: traffic reduction, accuracy, computation overhead, and generality. It combines three techniques---3-value quantization with sparsity multiplication, base-3^5 encoding, and zero-run encoding---to leverage the strengths of quantization and sparsification techniques and avoid their drawbacks. 3LC achieves a data compression ratio of up to 39--107X, preserves the high test accuracy of trained models, and provides high compression speed. Distributed ML frameworks can use 3LC without modifications to existing ML algorithms. Our experiments show that 3LC reduces wall-clock training time of ResNet-110 for CIFAR-10 on a bandwidth-constrained 10-GPU cluster by up to 16--23X compared to TensorFlow's baseline design.