[The research for this blog was done by Nil Adell Mill
[https://www.linkedin.com/in/niladellmill/]. Editing and contextualizing by
Nikolas Molyndris and David Sturzenegger]
The question of data privacy in machine learning is still widely overlooked. In
cases where the privacy loss comes from leakage of a machine learning model the
topic gets quickly very convoluted, adding to the overall mistrust of these
still misunderstood systems . Bringing trust into these processes requires
unwinding the complexity with safe and simple solutions. In this blog, we
propose a new machine learning training metric that greatly reduces the risk of
data re-identification. It can be embedded in any model training with minimal
Data leakage may come in many forms, from the retrieval of particular attributes
 to the reversing of the original training samples . This gains even
further relevance when we evaluate systems of Machine Learning-as-a-Service
(MLaaS), like those offered by Google or Amazon, where users may not be experts
and have only limited access to the entire underlying pipeline used to train the
models. In fact, a recently [https://arxiv.org/abs]published article 
presented an approach that allows someone to tell, in an MLaaS trained model, if
a given image was part of the dataset used to train the model or not. More
strikingly the said method does not require full access to the model but only
access to its inputs and outputs once it has already been released. A method
known as black-box access.
This particular type of attack, known as membership inference (MI), is regarded
as the easiest kind of attack to implement, which makes it a great thing to
test. In the case of the very popular risk for face image reconstruction
membership inference is a crucial first step. If you cannot even tell if an
image was used to train a model or not, how can you even start reconstructing
It is yet not clearly understood where, when and why data leakage happens in
machine learning models. While a significant amount of time has been spent on
creating these "attacks" and showing the insecurity of the models, There very
little research on how a data scientist can defend against these attacks, while
it is an increasingly important topic as more important decisions are being
delegated to these models and more sensitive data is being fed into them. Among
industry practitioners, it is seen as logical that one of the main causes of
data leakage is overfitting. Unsurprisingly, the more a model memorizes a
dataset, the easier it is for it to leak data. Previous research  has studied
and observed this relationship1, while, at the same time, pointing out other
seeming causes like model capacity. The size and distribution of the dataset is
another important factor, as a rule of thumb, the more data per-label one has,
the less likely it is for a particular piece of data to leak. The type of
problem that the model is designed to solve also plays a big role. For example,
it has been demonstrated that text predictive models like Gmail’s may leak data
such as credit card or social security numbers .
At the moment there is still not a simple way to assess the leaking and it is a
widely overlooked aspect of machine learning models. We propose a new metric for
training machine learning models that tracks the risk of membership inference,
or simply MI-metric. The metric enables data scientists to rank the models
according to how probable they are to leak data. In the same way one monitors
validation accuracy during training, the MI-metric will track the leakage risk.
Our proposition is a second model – an attack model - that performs at every
epoch membership inference attacks on the primary model – the main model. As the
leakage risk is most of the time associated with potential adversarial attacks
in a black-box situation, the attack model has access to the current epoch’s
predictions of the main model. The MI-metric is the accuracy of the attack model
This evaluates how successful an attacker could be before we expose our model to
the world. The underlying philosophy is comparable to doing penetration testing
in software. By analyzing membership inference, we test against one of the
easiest types of attacks that can be performed on a model — if you can't verify
that a sample was in the training dataset there's nothing else you can really
extract about that sample.
This approach has the added benefit of recreating a stronger attack than a
malicious party would, since – as we will show - the attack model knows the
original dataset and how it's partitioned. The attack model has full knowledge
of the ground truth, while an attacker would have to come up with a proxy — a
shadow model [3,6] — against which to train their membership inference model.
We propose Algorithm 1 that combines the normal model training with a per-epoch
evaluation of the MI-metric, measuring the susceptibility to membership
inference attacks. First, from the training and validation data we generate a
new membership inference dataset. The samples coming from the training set will
be labeled as a members (1’s) and those from the validation set as non-members
(0's). This MI dataset is then randomly divided into MI training and testing
datasets. At every epoch, we will pass the samples from the MI dataset through
the main model, collect the predictions from this data, sort them2, and train
the attack model with these values as inputs and the membership as labels.
Once trained, the MI-metric is defined as the accuracy of this attack model. It
gives an idea to what extent a model could reveal that a sample was used during
training or not.
Our implemented algorithmSo, what to use as an attacker model? Before answering
that question, we have to choose a dataset, and a main model that will be used
to defend against leakage.
Regarding the dataset, we focused on analyzing the results of different main
models applied to the SVHN, CIFAR-10, and CIFAR-100 datasets. We will mainly
focus on the CIFAR-100 dataset because it has a relatively low number of images
per label, which makes it more likely for a model to memorize data from a
In respect to the main model, we trained three different models: Mobilenet ,
Resnet50 , and VGG16  with different optimizers. Regarding the
hyperparameters, we focused on exploring the learning rate and different values
of weight decay. The latter has usually been pointed out in the membership
inference literature [3,6, 10] as a possible solution for avoiding data leakage.
On top of that we had a stepped learning rate decrease (a reduction of 80% at
particular epochs). The easier to train models, like Mobilenet, were explored
more in-depth due to their ease of training.
Finally, for our attack model, a large repertoire of models can be used. For
instance, we tested two particular models: a linear classifier and gradient
boosted trees (XGBoost [https://xgboost.readthedocs.io/en/latest/]). Albeit the
linear classifier obtained qualitatively similar results to XGBoost, the latter
consistently showed superior prediction accuracy so we will limit ourselves to
report this model's results. In addition, selecting attacker model is a
relatively cheap task compared to the whole training pipeline, hence it can be a
separate task with virtually no added computational cost.
When observing the results on all the datasets (as shown in figure 1) one can
qualitatively assess the apparent correlation between overfitting and membership
inference accuracy. Often when the performance on the test set stops improving,
the MI-metric starts increasing. This relationship to overfitting has been
observed repeatedly in previous works [3,4]. Nevertheless, overfitting cannot
always solely explain the propensity for a model to leak data . Some of our
results demonstrate this phenomenon.
Figure 1. Performance of Resnet 50 under different values of weight decay.
Interestingly enough, in this particular case, weight decay appeared to be
proportional to the amount of data that was leaked. The optimizer was SGD and
fixed the learning rate at 0.01.
The maximum MI accuracy we were able to achieve was 68.8%. Although this is an
improvement over flipping a coin, it is far from a great predictor for
membership. While considering that we made an ad-hoc choice of attack model,
this still indicates that even in the worst-case membership inference attacks
may not be easy to pull off.
There is a significant spread of the MI-metric across the different results
(Tables 1-3) which does not always go hand-in-hand with the accuracy of the
model. The Mobilenet results for ADAM with a learning rate of 0.001 in Table 1
show this, where at increasing values of the weight decay the accuracy of the
model improves while the MI-metric decreases (from 61.3% to 65.4% for the
former, and from 62.8% to 56.1% for the latter).
The opposite effect can be found too. In models trained with SGD (Mobilenet in
Table 1, Resnet50 in Figure 1), weight decay makes the model perform better
while also increasing the MI-metric. That can be graphically seen in Figure 1,
where the weight decay value seems to be proportional to the MI-metric. The same
observation, however, does not seem to hold when ADAM is used as an optimizer.
OptimizerLearning rateWeight DecayAccuracyMI AccuracySGD30.1060.7259.03SGD0.1
ADAM0.0011.00E-0462.4360.83ADAM0.0011.00E-0365.4856.07Table 1. Mobilenet results
OptimizerLearning rateWeight DecayAccuracyMI AccuracySGD0.010.00E+0073.5355.97
0.015.00E-0456.9456.27Table 2. Resnet50 results
OptimizerLearning rateWeight DecayAccuracyMI AccuracySGD0.11.00E-0568.2467.21SGD
5.00E-0540.1849.78Table 3. VGG16 results
There is a growing concern for the capacity of deep learning models to memorize
and leak data. Previous work has demonstrated successful membership inference
attacks on models trained on MLaaS platforms. This work introduced a metric that
monitors potential model data leakage. The metric can be reported alongside
other standard performance metrics during training. The proposed metric uses
membership inference accuracy of an attack model as a proxy. Computing this
metric comes at a very low computational cost due to the overall simplicity of
its execution. The proposed approach is model and framework agnostic - it can be
used in any training situation.
Our results were mostly in line with those previously reported in the
literature. Nevertheless, there were situations where hyperparameter
configurations yielded unexpected results, e.g. higher membership inference
scores. This motivates further research into the underlying reasons for data
leakage. It also justifies a direct empirical measure of the model’s
susceptibility to membership inference attacks, such as the MI-metric proposed
in this paper.
At decentriq we want to enable machine learning applications without worrying
about the sensitivity of the underlying data. Making sure that the deployed
algorithms protect this data is a significant step towards this.
 Oh, Seong Joon, Bernt Schiele, and Mario Fritz. "Towards reverse-engineering
black-box neural networks." Explainable AI: Interpreting, Explaining and
Visualizing Deep Learning. Springer, Cham, 2019. 121-144.
 Fredrikson, Matthew, et al. "Privacy in pharmacogenetics: An end-to-end case
study of personalized warfarin dosing." 23rd Security Symposium (Security 14).
 Shokri, Reza, et al. "Membership inference attacks against machine learning
models." 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017.
 Yeom, Samuel, et al. "Privacy risk in machine learning: Analyzing the
connection to overfitting." 2018 IEEE 31st Computer Security Foundations
Symposium (CSF). IEEE, 2018.
 Carlini, Nicholas, et al. "The secret sharer: Measuring unintended neural
network memorization & extracting secrets." arXiv preprint arXiv:1802.08232
 Salem, Ahmed, et al. "Ml-leaks: Model and data-independent membership
inference attacks and defenses on machine learning models." arXiv preprint
 Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural
networks for mobile vision applications." arXiv preprint arXiv:1704.04861
 He, Kaiming, et al. "Deep residual learning for image recognition."
Proceedings of the IEEE conference on computer vision and pattern recognition.
 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for
large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
 Sablayrolles, Alexandre, et al. "D\'ej\a Vu: an empirical evaluation of the
memorization properties of ConvNets." arXiv preprint arXiv:1809.06396 (2018).
 Chakraborty, Supriyo, et al. "Interpretability of deep learning models: a
survey of results." 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing,
Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big
Data Computing, Internet of People and Smart City Innovation. IEEE, 2017.
1. It is of interest to mention also the fact that deep learning models have
been reported to first learn patterns and structures of the data. Then, if
further trained, they start fitting random noise, which is akin to
memorizing the data they are being given.
2. In general, we only care about the shape of the output, so the actual label
that the model is outputting is not so important. Sorting allows the
features to be ordered in a consistent manner. Despite this, other authors
have pointed out that there may be differences on data leakage depending on
the class (for instance in cases where there are unbalanced classes), our
method would not regard that information for its classification.
3. Whenever we refer to SGD the actual optimizer used is stochastic gradient
descent with nesterov momentum (of 0.9)