Part of Proceedings of Machine Learning and Systems 2 (MLSys 2020)
Abdul Wasay, Brian Hentschel, Yuze Liao, Sanyuan Chen, Stratos Idreos
Ensembles of deep neural networks significantly improve generalization accuracy. However, training neural network ensembles requires a large amount of computational resources and time. State-of-the-art approaches either train all networks from scratch leading to prohibitive training cost or generate ensembles by training a monolithic architecture resulting in lower diversity and accuracy. We propose MotherNets to address these shortcomings: A MotherNet captures the structural similarity across different members of a deep neural network ensemble. To train an ensemble, we first train a single or a small set of MotherNets and subsequently, their function is transferred to all members of the ensemble. Then, we continue to train the ensemble networks, which converge significantly faster compared to training from scratch. MotherNets can handle ensembles with diverse architectures by clustering ensemble networks of similar architecture and training a separate MotherNet for every cluster. MotherNets also use clustering to balance the accuracy vs. training cost tradeoff. We show that compared to state-of-the-art approaches such as Snapshot ensembles, knowledge distillation, and TreeNets, MotherNets can achieve better accuracy given the same time budget or alternatively that MotherNets can achieve the same accuracy as state-of-the-art approaches at a fraction of the training time. Overall, we demonstrate that MotherNets bring not only performance and accuracy improvements but a new powerful way to balance the training cost vs. accuracy tradeoff and we verify these benefits over numerous state-of-the-art neural network architectures.