GDPR-compliant AI/ML using Confidential Computing

Conclave Nov 23 2022 By: Sneha Damle




GDPR-compliant AI/ML using Confidential Computing
Sneha Damle
Sneha Damle Senior Developer Evangelist, Conclave
Share this post:

R3 provides GDPR-compliant solutions, answering the million-dollar data protection question. The General Data Protection Regulation (GDPR), Europe’s framework for data protection laws, is still significantly impacting industries dealing with sensitive data. The healthcare industry specifically is one of those facing many hurdles in correspondence.

GDPR requires “consent” as one lawful basis to process “personal data.” Another additional lawful basis for processing “special category personal data” (health and genetic data) exists where such processing is “necessary for scientific or historical research purposes.”

There needs to be a balanced approach to protect patients’ privacy while ensuring patient data can be shared for healthcare and research purposes.

Whenever issues linked to data protection are under discussion, it is all too easy to get distracted from the one simple point that attracted us to the discussion in the first place: the fact that there are many millions of patients across Europe who have unmet health needs. New treatments are only going to come from medical research, and the use of patient data will play a crucial role in this[…]

Nick Meade, Director of Policy at Genetic Alliance UK.

Sensitive personal data benefit from additional protection in EU law. Unauthorized access to sensitive personal data can impact a patient’s personal and professional life. But at the same time, as mentioned by Nick, patient health records are essential for advancements in research and public health promotion.

What is the problem?

Applying AI/ML modeling to healthcare that is GDPR-compliant.
Applying AI/ML modeling to healthcare

To help future generations live longer and healthier, Our Future Health has invited three million people to provide blood samples to perform analytics that can detect chronic diseases early or identify people at higher risk of a condition before it develops. This can be achieved by training an AI/ML model with health data to accurately detect the possibility of a certain disease before it develops. A lot of people want to participate in this promising research. However, one concern holds them back.

How can their data privacy be maintained?

To enable better accuracy from the model’s predictive capabilities, the Our Future Health organization needs a way to train this model from hospitals all over the world. As a result, they may consider moving the model’s training and prediction onto the cloud. However, there are several issues here. Since it is private and personal information, the organization will want to keep all model parameters, like weights and biases, confidential. At the same time, hospitals will also want to keep their patient data secure, given their sensitive nature.

Hence the million-dollar question – can we achieve both? Can health organizations protect model parameters even though the model is deployed in the cloud, while hospitals contribute sensitive patient data towards research advancements without compromising on the protection of patient privacy?

Current GDPR-compliant solutions

Secure multi-party computation (MPC) and fully homomorphic encryption are some solutions available today which can be used to confidentially train an AI model. But again, the question is how practical is it to use MPC and homomorphic encryption? Recent work suggests large runtime overheads, which limits their practical adoption for compute-intensive analyses of large datasets. For example, Falcon estimates that training a neural network like AlexNet on datasets like MNIST datasets can range from a few weeks to hundreds of weeks.

Proposed GDPR-compliant solution

GDPR-compliant Multi Party Training AI Model using Conclave, Tribuo
Multi-Party Training AI Model using Conclave, Tribuo

We at R3 have built Conclave, a privacy-preserving multi-party machine learning platform based on trusted Intel SGX processors. The basic idea of a trusted Intel SGX processor is that it creates a secure region of memory address space called an enclave. Code and data inside the enclave’s protected memory cannot be read or modified by any external process. SGX guarantees the confidentiality and integrity of code and data within the enclave even if the underlying host operating system, the hypervisor, and the BIOS are compromised or malicious. Enclave data is encrypted and authenticated by a key known only to the enclave.

Explore the world of confidential computing with Intel and R3’s Conclave.

Intel SGX supports remote attestation, which attests to the code and data loaded inside the enclave. Third-party owners can verify this. Conclave supports saving the model to external storage in an encrypted format which can later be retrieved and loaded again. Conclave is open-source and helps the open-source community collaborate by inspecting the code. R3 also provides a confidential computing cloud platform that helps you build and deploy privacy-preserving applications in minutes.

Tribuo and Conclave

We have built a sample that shows how to deploy an AI model which detects breast cancer to an SGX enclave. It uses  Tribuo, a machine-learning library written in Java.

The sample demonstrates how easy it is to deploy this model using Tribuo and Conclave. In the sample, multiple hospitals send their data to the enclave to train the model. Inside the enclave, the application loads a simple Tribuo model, and aggregates the data from all clients. This data is used to train the model. Finally, the sample demonstrates how to test the model using the data provided by the hospitals and send the evaluation results back to all the hospitals.

Health organizations will need to collaboratively take their sensitive datasets onto the cloud and apply AI/ML analytics to privacy-sensitive patient data. Confidential Computing opens the doors for this collaboration among healthcare organizations. It preserves patient data privacy within a regulated, GDPR-compliant framework, laying the groundwork for advancements in research and clinical trials.

Thanks to Richard, Tamsin, John, and the Conclave Team.

Sneha Damle
Sneha Damle Sneha Damle is a Developer Evangelist at R3, an enterprise blockchain software firm working with a global ecosystem of more than 350 participants across multiple industries from both the private and public sectors to develop on Corda, its open-source blockchain platform, Corda Enterprise, a commercial version of Corda for enterprise usage, and Conclave, a confidential computing platform.

Leave a Reply

Subscribe to our newsletter to stay up to date on the latest developer news, tools, and articles.