The Key to Sensitive Data Analytics is Encrypted in an Enclave

May 28, 2021
At decentriq we aim to solve a simple problem. We want to make data collaborations simple and safe. Combining datasets across organizations can unlock huge value, but companies are reluctant to share data due to sensitivity concerns. This usually means either involving a trusted third party through a complicated process, or not doing the project at all. In this blog we explore how we remove the need for a trusted third party (even decentriq!), by using Intel® Software Guard Extensions (Intel® SGX). This is the first in a series of posts explaining this technology and how it allows us to make data collaborations simple and safe. This post aims at giving a high-level overview of the concepts before becoming more technical in the following posts. Ready? Let’s dive in! A dive into the Secure Enclaves Secure Enclaves, also known as trusted execution environments [], are computer programs with additional privacy and security guarantees. Figuratively speaking, a Secure Enclave is a safe box inside a computer processor that allows data to get in, lock it, do computation on it, and then ship the results out. While this process sounds simple (and in principle it is) there are multiple aspects that make the end-to-end implementation, let’s say… interesting. > Figuratively speaking, a Secure Enclave is a safe box inside a computer processor that allows data to get in, lock it, do computation on it, and then ship the results out. At decentriq, we use the most advanced type of Secure Enclave, Intel’s SGX, to guarantee the safety and confidentiality of our users’ data. Intel SGX is basically an extension of the instructions a CPU can perform, coupled with some extra circuits on the actual processor. These instructions and circuits enable running software “safe boxes” (aka enclaves) that are isolated from all the other processes. In simple terms, this means that it is not possible to look at the data being processed inside an enclave. This is achieved by enclave-memory encryption, in other words garbling all data outside the CPU, such that no other processes can read it. This is often referred to as encryption of data in-use (while computing on it), and not only in-rest or in-transit (on hard-disk or when sending it over a network). Additionally, these instructions allow the enclaves to identify themselves and their code to remote users.  But what does “identify themselves” mean? Next to the isolation mentioned before, this is the core concept that any good Secure Enclave needs such that a user can trust it. Simply put, it means that the Secure Enclave is doing what the user expects. As a user, before I send my data to a remote program, I want to be sure that this program is what I expect. Without enclaves, there is no way of getting such a proof. Intel SGX enables an enclave to provide this proof through a process that is called remote attestation. Still here? Good! We know from experience that hearing these concepts for the first time can be somewhat intimidating. This is why we will take an even deeper dive into enclave-memory encryption and remote attestation in a second blog post. For now, let’s take these concepts for granted and look at how they translate into the guarantees our avato platform offers. The avato guarantees Remember that avato aims to make data collaborations simple and safe. On the highest level, a data collaboration consists of multiple parties putting data into avato and one or multiple parties perform computations inside avatoand receive results. To make it as safe as possible for our users, we provide some guarantees to them. Which we. take. very. seriously. The high level process of working with avatoGuarantee 1 - The datasets can only be decrypted inside the enclave. This guarantee means that nobody can look at the data on its way to the enclave. It is achieved for each user by encrypting their data with well-known (in fact, the internet runs on it) public-key encryption [] using the enclave’s public key. This public key is received as part of the remote attestation process when the enclave identifies itself to the user. Guarantee 2 - Only the intended result leaves the enclave; or none of the datasets can be inspected by any third party. This guarantee means that nobody can look at the data when it is “in” the enclave. It is achieved by the fact that enclave-memory is encrypted also during the computations. Simply meaning, that not even the operating system is able to tamper with the data. On top of that, we provide you with a way to validate that only your code is running in the enclave by giving you the possibility to reconstruct an identical enclave from open-source material. Hence, we expose ourselves to public scrutiny and guarantee that only the expected code is running and nothing else. Guarantee 3 - Only pre-specified users can decrypt the result. This is easily achieved by having the enclave create one encrypted result per pre-specified user, each encrypted with this user’s key. Wrap up - The road to sensitive data analytics In this blog we take a shot at explaining in straightforward way a fairly complicated technology. However, our mission here at decentriq is pretty simple. We want to make sensitive data analysis simple and safe. Intel SGX is an extremely useful technology that allows us to enable capabilities to our clients that were impossible before, at a scale and speed that fits directly into current workflows. We hope that this blog gives you a better idea of what we are working with, for the next blog series we are going to expand on two different aspects. A deeper technical dive on SGX, and a blog about securing data privacy with SGX while running Machine Learning models. In the meantime, if you prefer listening, Intel recently featured us in their podcast series []. For any questions please don’t hesitate to contact us!