Repeated knowledge distillation with confidence masking to mitigate membership inference attacks

F Mazzone, L van den Heuvel, M Huber…�- Proceedings of the 15th�…, 2022 - dl.acm.org
F Mazzone, L van den Heuvel, M Huber, C Verdecchia, M Everts, F Hahn, A Peter
Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security, 2022dl.acm.org
Machine learning models are often trained on sensitive data, such as medical records or
bank transactions, posing high privacy risks. In fact, membership inference attacks can use
the model parameters or predictions to determine whether a given data point was part of the
training set. One of the most promising mitigations in literature is Knowledge Distillation
(KD). This mitigation consists of first training a teacher model on the sensitive private
dataset, and then transferring the teacher knowledge to a student model, by the mean of a�…
Machine learning models are often trained on sensitive data, such as medical records or bank transactions, posing high privacy risks. In fact, membership inference attacks can use the model parameters or predictions to determine whether a given data point was part of the training set. One of the most promising mitigations in literature is Knowledge Distillation (KD). This mitigation consists of first training a teacher model on the sensitive private dataset, and then transferring the teacher knowledge to a student model, by the mean of a surrogate dataset. The student model is then deployed in place of the teacher model. Unfortunately, KD on its own does not provide users much flexibility, meant as the possibility to arbitrarily decide how much utility to sacrifice to get membership-privacy. To address this problem, we propose a novel approach that combines KD with confidence score masking. Concretely, we repeat the distillation procedure multiple times in series and, during each distillation, perturb the teacher predictions using confidence masking techniques. We show that our solution provides more flexibility than standard KD, as it allows users to tune the number of distillation rounds and the strength of the masking function. We implement our approach in a tool, RepKD, and assess our mitigation against white- and black-box attacks on multiple models and datasets. Even when the surrogate dataset is different from the private one (which we believe to be a more realistic setting than is commonly found in literature), our mitigation is able to make the black-box attack completely ineffective and significantly reduce the accuracy of the white-box attack at the cost of only 0.6% test accuracy loss.
ACM Digital Library
Showing the best result for this search. See all results