Is it a match? A privacy-preserving analytics tool

May 28, 2021
Buying data is a lengthy process on its own. Finding the right vendor, type of data and passing all the regulatory and legal procedures takes time and costs money. On top of that, in practical reality, the procedure is even lengthier because it starts before the decision to buy the data. Identifying which dataset has the desired profiles and statistical qualities often can only be done after starting the acquisition process because of confidentiality and privacy concerns. This multiplies the costs and time needed by the number of datasets that are explored. The way this problem is dealt with today is described in the image below. The data acquirer and the data holder both give their datasets (or the requirements) to a trusted third party, which then aggregates all the data and manually identifies the profile pairing. The data sharing process today is pretty ancientWhile this is cheaper and faster than getting through the whole acquisition process, it still takes weeks and costs tens of thousands of dollars. Case: Healthcare One of the most prominent examples is the healthcare industry and the case for linking patient data. There are undeniable benefits in augmenting a dataset with different data about the same profiles (1 [] ,2 [] ). Examples range from linking images with prescription data to following the patients after a clinical trial to the "real world". Having this kind of datasets, data analytics experts can accelerate drug development [] and can help health care companies deliver much better care by better understanding their patients [] and the effects of their drugs. The combination and the acquisition of data is however stumbling upon two blockers: 1.    Finding the data that contain the relevant profiles is very costly and time-consuming 2.    Patient profile linking is time-consuming because of the privacy and security concerns It's a match! – from 2 weeks to 2 minutes Following our commitment to make the lives of our customers simpler while preserving privacy and security, we created the avato platform [/closing-the-circle-of-security-on-the-cloud-decentriqs-avato-platform/], which includes the Private Set Intersection Instance (PSII). The PSII is based on the latest advancements in cryptography and allows a software solution without having to involve a third-party. It runs on any browser that can securely and privately: 1.    Identify the overlap between an owned dataset and a target dataset on chosen keys (for example, patient profiles). 2.    Ensure that after the purchase of the dataset, the linkage of the profiles will be as friction-less as possible by anonymously linking the dataset, without any human ever seeing the whole data. 3.   Allow paying only when the intersection threshold is met. Private Set Intersection Instance - User story With the Private Set Intersection Instance from the avato platform, no data is revealed or moved before the purchase. 1. Buyer Inc identifies a dataset which is potentially interesting for its needs from: a.   Partnerships with data providers that are already in place b.   decentriq’s listing of available datasets/providers 2. After getting in touch with the data holder, both Buyer Inc and Seller Inc downloads and open decentriq’s PSII to simply drag and drop their datasets on the instance. The software extracts the metadata of both companies, encrypts the datasets and compares them to identify the linkage/overlap potential. 3. The users get a percentage score of how many profiles they have in common, and the data buyer receives some anonymized statistics regarding the target data 4. If the parties decide to transact, our tool anonymously links their datasets without any party (including decentriq) ever seeing the complete, de-anonymized dataset. The data always stays local. The promise of trusted analytics in healthcare Artificial Intelligence has been long promised to revolutionize how the healthcare industry operates. While change has indeed happened in the past years, the adoption has not been as extensive as some have expected. The sensitivity of the underlying data, and the privacy-breaching power that their combination hold, are some of the biggest hurdles in data cooperation in the industry. We demonstrate that this dilemma no longer holds.