A Tale of Two Models: Constructing Evasive Attacks on Edge Models

Part of Proceedings of Machine Learning and Systems 4 pre-proceedings (MLSys 2022)


Bibtek download is not available in the pre-proceeding


Wei Hao, Aahil Awatramani, Jiayang Hu, Chengzhi Mao, Pin-Chun Chen, Eyal Cidon, Asaf Cidon, Junfeng Yang


Full-precision deep learning models are typically too large or costly to deploy on edge devices. To accommodate to the limited hardware resources, models are adapted to the edge using various edge-adaptation techniques, such as quantization and pruning.While such techniques may have a negligible impact on top-line accuracy, the adapted models exhibit subtle differences in output compared to the original model from which they are derived.In this paper, we introduce a new evasive attack, DIVA, that exploits these differences in edge adaptation, by adding adversarial noise to input data that maximizes the output difference between the original and adapted model. Such an attack is particularly dangerous, because the malicious input will trick the adapted model running on the edge, but will be virtually undetectable by the original model, which typically serves as the authoritative model version, used for validation, debugging and retraining.We compare DIVA to a state-of-the-art attack, PGD, and show that DIVA is only 1.7--3.6% worse on attacking the adapted model but 1.9--4.2 times more likely not to be detected by the the original model under a whitebox and semi-blackbox setting, compared to PGD.