or How decentriq enables cross-company data collaboration in a GDPR-compliant
[This blog has been co-authored by Anna Maria Tonikidou and Christian Meisser
from the legal services firm LEXR [https://www.lexr.ch]]
Mergers and acquisitions (M&A) totalled in worth almost $2.49 trillion in the
first three quarters of 2019. Some prominent examples are Saudi Aramco/Saudi
Basic Industries Corporation, AbbVie/Allergan, Bristol-Myers Squibb/Celgene, and
United Technologies/Raytheon1. For any M&A, due diligence is essential to
confirm information about the two partners and to estimate the future value of
the potentially combined companies2.
Identifying the size of the shared customer base is an essential task done early
in the process. At this stage however, the two companies usually are legally not
allowed to exchange their customer databases.
So far, the only workaround has been to hire a trusted third party who gathers
the data from both companies and compares them. This comes with a lengthy legal
process in itself, especially when the companies are in different jurisdictions
and different privacy regulations apply.
> What if the two companies could determine the size of their shared customer base
without having to share their data with anyone?
In this blog, we will first introduce the concept of private set intersection as
a generalization of the shared customers problem above. We then show how its
solution has been implemented in decentriq’s avato platform in a way that
removes the need for a trusted third party (also not decentriq). With avato,
both parties provide their customer databases into the platform, receiving very
particular security and privacy guarantees: Provably, nobody (not even
decentriq) can access their unencrypted data and only the size of the shared
customer base is output. While we focus here on private set intersection, avato
extends to many other use-cases, in particular privacy-preserving machine
Together with the legal services company LEXR, we will argue that this and
similar processing in avato is in line with the General Data Protection
Regulation (GDPR) because individual-level data are not shared.
Private set intersection
A private set intersection (PSI) is the process of determining the intersection
of two or more datasets (think lists of customer names) without revealing any of
the data to anyone. In the M&A case described above, this means calculating the
number of shared customers of two companies without disclosing any customer
information to any of the companies or any third party.
> A private set intersection (PSI) is the process of determining the intersection
of two or more datasets (think lists of customer names) without revealing any of
the data to anyone.
This situation is illustrated below. The customer databases are exemplified on
the left and the right, while the output is the number of shared customers which
in this case is two. As the set intersection should be private, the two customer
databases should be kept confidential at all times.
This is not a trivial task. Using a trusted third party comes with the lengthy
processes and costs discussed above. For avoiding the use of third parties,
traditionally hashing approaches have been applied. Unfortunately, none of them
are really satisfying:
* Naïve approaches apply the same hashing function to the names in both
databases, exchange the result and compare the hashes3. The identical name
will have the same hash and can thus be identified as shared. As each party
knows the hashes of their customers, they can also infer the names of the
shared customers. This can already represent a violation of local privacy
* More involved approaches use double-hashing techniques. These are more
complicated, susceptible to privacy attacks and most importantly still fail
in the common case of slight differences in the names – think “Freddy
Mercury” in one database vs “Fred Mercury” in the other.
New developments come to the rescue. Recent advances in hardware-based
cryptography enable new, strictly superior solutions to the private set
The key to privacy-preserving PSI is encrypted in an enclave
The avato platform leverages Intel’s Software Guard Extensions (Intel SGX
[https://software.intel.com/en-us/sgx]) technology to create so-called secure
enclave programs. These are isolated computer programs which can provide
additional security and privacy guarantees even when running on public cloud
> Using avato provides Anna and Paul with a simple and safe way of performing the
private set intersection. Compared to other approaches, it does not require a
trusted third party or complicated algorithms.
The figure below illustrates the situation. Anna and Paul work at the two
companies and are tasked with computing the size of their shared customer base
in a privacy-preserving and GDPR compliant way. They decide to use an avato
secure enclave. After receiving the relevant security proofs, they locally
encrypt their customer databases and submit them into the secure enclave.
Provably, this particular secure enclave is the one and only program that can
ever decrypt this data. In the enclave, the identifiers are matched, and the
number of shared customers is sent back to Anna and Paul.
While postponing the technical details to the appendix section “How to provide
the security guarantees”, the use of an avato secure enclave gives Anna and Paul
the following security and privacy guarantees:
1. Only the particular enclave program Anna and Paul are connected to can
decrypt their customer databases.
2. Nobody can access the decrypted data, including decentriq and potential
infrastructure providers running avato.
3. The secure enclave only outputs privacy-preserving aggregate statistics such
as the number of shared customers.
Using avato provides Anna and Paul with a simple and safe way of performing the
private set intersection. Compared to other approaches, it does not require a
trusted third party or complicated algorithms while making it possible to use
more sophisticated matching algorithms (fuzzy matching) and outputting
additional privacy-preserving statistics. Crucially, as long as the above
guarantees hold and the output is non-personal data (e.g. the number of shared
customers), the described use of avato is in line with GDPR. This is discussed
in detail in the following section written by Anna Maria Tonikidou and Christian
Meisser from the legal services firm LEXR.
Why using avato is in line with GDPR
The term 'personal data' is the entryway to the application of the GDPR.
'Personal data' is defined in Article 4 (1) GDPR as any information relating to
an identified or identifiable natural person. Such a person is referred to as a
data subject. The data subjects are identifiable if they can be directly or
indirectly identified. The definition of personal data is based on the realistic
risk of identification, and the applicability of data protection rules should be
based on risk of harm and likely severity.4
> avato as a host of encrypted data is not processing personal data under the
definition of the GDPR. decentriq cannot access that data, and even if its
servers were breached, data subjects would be at little risk from a privacy
According to Recital 26 (5) GDPR, the principles of data protection should not
apply to anonymous information, namely information which does not relate to an
identified or identifiable natural person or to personal data rendered anonymous
in such a manner that the data subject is not or no longer identifiable.
In contrast to anonymous information, there is no mention of the qualification
of encrypted information in the GDPR, and so far, no EU/EEA court has explicitly
decided whether encrypted data is personal or not. However, the highest agency
for data protection regulation in Bavaria (Landesamt für Datenschutzaufsicht)
has concluded that encrypted data does not fall under the category of personal
data, under the premise that it is encrypted with strong state-of-the-art
Whether encrypted data are personal data therefore depends on the circumstances,
particularly on the means reasonably likely to be used (fair or foul) to
re-identify individuals.6 Factors affecting encrypted data's security against
decryption include the following:
* Strength of encryption method (the algorithm's cryptographic strength)
* Key management, such as security of decryption key storage, and key access
Under WP136, ‘anonymised’ data may be considered anonymous in a provider’s hands
if ‘within the specific scheme in which those other controllers (e.g. providers)
are operating, re-identification is explicitly excluded and appropriate
technical measures have been taken in this respect’.8
According to the UK Information Commissioner's Office (ICO), if (i) a party has
encrypted personal data itself and (ii) is responsible for managing the key, it
is processing data covered by the GDPR, since it has the ability to re-identify
individuals through decryption of that dataset.9 On that basis,
Hon/Millard/Walden suggest that if a party cannot view data, it cannot identify
data subjects, and therefore identification may be excluded by excluding others
from being able to access or read data.10 By analogy with key-coded data, to the
person encrypting personal data, such as a cloud user with the decryption key,
the data remain 'personal data'.11 However, in another person’s hands, such as a
cloud-based platform provider storing encrypted data without access to the key
and no means 'reasonably likely' to be used for decryption, the data may be
considered anonymous.12 This removes cloud providers from the scope of data
protection legislation, at least where data have been strongly encrypted by the
controller before transmission, and the provider cannot access the key.
With encryption, many of the parties who are processing the data do not have the
encryption key. The encryption key stays with the generator of the data. This is
the case with avato, meaning that encryption in this case bears similarities to
the effects of anonymization, as decentriq has no means of reversing the process
to access the raw data. In fact, decentriq has no way of knowing whether
personally identifiable information is contained in the sets transferred to
avato, and as such it would be impossible to define the scope of processing
within a data processing agreement with its clients. decentriq also has no more
chances of accessing the data than anyone who finds the key by accident. avato's
strong encryption therefore bears effects similar to anonymization, i.e. it
renders personal data in the sense of the GDPR into non-personal data from the
point of encryption.
As a result of the above, for all intents and purposes, avato as a host of
encrypted data is not processing personal data under the definition of the GDPR.
decentriq cannot access that data, and even if its servers were breached, data
subjects would be at little risk from a privacy standpoint since the data would
also be unintelligible to the wrongdoers.
In this blog we have introduced the private set intersection problem and
motivated it with the use-case of a potential merger of two companies where the
number shared customers should be computed privately. We have argued that
traditional approaches to this problem are not satisfactory and that new
technologies such as Intel SGX enable strictly superior solutions. One such
solution is decentriq’s avato platform which enables provably privacy preserving
computation on data. We argued that the use of avato is in line with GDPR, even
when the computation is performed on personally identifiable data such as in the
outlined case. Even though we have used the example of private set intersection,
this generalizes to the many more confidential computing use-cases supported by
If you are interested to learn more about avato and to receive the full GDPR
assessment done by LEXR reach out to firstname.lastname@example.org.
Appendix - How to provide the security guarantees
This section focuses on the technical details of performing a private set
intersection with avato. The section describes how the security and privacy
guarantees can be achieved.
To make things more concrete, this demo video [https://youtu.be/jnYCc_ojDHw]
shows how Anna and Paul use the dedicated Python API and web application to
create an avato secure enclave, get the security proofs, encrypt their data and
privately compute the number of shared customers.
Coming back to the security and privacy guarantees, the following points must be
ensured to achieve them
1. The secure enclave program must only compute the number of shared customers
and delete all input data afterwards.
2. The cryptographic keys Anna and Paul use to encrypt their customer databases
must only be known to the particular secure enclave they are connected to.
3. The data decrypted by the secure enclave must not be accessible, even to
administrators with potential access to the operating system such as
decentriq and infrastructure providers.
This can be achieved using Intel’s SGX technology. Already in place for seven
years, the technology is available on most modern Intel CPUs. As outlined in
much more detail in this blog, Intel SGX-based secure enclave programs are
founded on two main security pillars.
As a first security pillar, a process called remote attestation allows a user to
confirm for a remotely running program: i) the fact that it is a secure enclave
program; ii) its program logic (source code); iii) a cryptographic key that
enables encrypting data in a way that only can be decrypted by that particular
program. A user can perform remote attestation by inspecting a particular piece
of data received from the enclave. In the above figure these data are indicated
as security proofs. Using these security proofs, i) is confirmed by checking a
cryptographic signature that only can be obtained by a secure enclave program;
ii) is checked by comparing a hash of the secure enclave’s source code to an
expected value; and iii) is achieved by using a public key sent as part of the
security proofs and whose private counterpart is known only to the enclave (it
has been randomly generated when the secure enclave was started). As remote
attestation allows Anna and Paul to verify what program is running remotely and
encrypt data only for this particular secure enclave, this satisfies points 1.
As a second security pillar, enclave memory isolation and enclave memory
encryption protect the data sent into the secure enclave even from potential
system administrators. In order to process data, a CPU must read data from
memory. In traditional computing, these data must be unencrypted in order for
the CPU to perform computation on them. With Intel SGX, the CPU can read
encrypted data from memory because a dedicated decryption/encryption chip inside
the CPU handles the memory access of secure enclave programs. The
encryption/decryption is done on the fly within the CPU itself when enclave data
or code is leaving/entering the processor package. As a final protection, an
additional layer of memory address translation prevents access by non-enclave
programs to the secure enclave’s memory. Together, enclave memory encryption and
enclave memory isolation satisfy point 3.
We hope that this section has shed some light on how the security and privacy
guarantees can be achieved. The underlying technical details are quite involved.
It requires expert knowledge to leverage the powerful Intel SGX technology, and
decentriq specializes in that. If you have any questions regarding the security
or use of the avato platform, reach out to email@example.com.
4) Ustaran E, European Data Protection Law and Practice, 44.
5) Tätigkeitsbericht 2017/18 - Bayerisches Landesamt für Datenschutzaufsicht,
6) Mourby M, Are pseudonymized data always personal data? Implications of the
GDPR for administrative data research in the UK, in Computer Law & Security
Review, 2018, Vol. 34, 224.
8) Opinion 4/2007 on the concept of personal data, WP136 (2007).
10) Hon/Millard/Walden, The problem of 'personal data' in cloud computing: what
information is regulated? – the cloud of unknowing, in International Data
Privacy Law, 2011, Vol. 1, No. 4, 219.