Sharing Data and Knowledge Without Compromising Privacy

In today's data-driven world, many companies have valuable information but lack the expertise to use it effectively. On the other hand, some organizations have the skills to analyze data but need access to diverse datasets to build robust machine learning models. This creates a win-win situation where both parties can benefit from collaboration. However, there's a catch: both sides want to keep their sensitive information private. Data owners want to protect the privacy of their training data. Model owners want to keep their models and training methods confidential, as they may contain valuable intellectual property. Existing solutions, like federated learning and split learning, fall short in meeting these privacy needs simultaneously.

Enter Citadel, a system designed to address these concerns. It uses Intel SGX, a technology that creates secure enclaves where sensitive data can be processed without being exposed. Citadel runs distributed training across multiple enclaves, each representing a data owner, and an aggregator enclave for the model owner. To prevent any data or model leakage, Citadel employs zero-sum masking and hierarchical aggregation, creating a strong information barrier between the enclaves. Compared to other SGX-protected training systems, Citadel offers better scalability and stronger privacy guarantees. Cloud deployment tests with various machine learning models showed that Citadel can handle a large number of enclaves with minimal slowdown due to SGX. While Citadel presents a promising solution, it's important to note that the effectiveness of such systems depends on the trustworthiness of the underlying hardware and software. Additionally, the performance overhead introduced by SGX might not be negligible in all scenarios. Nonetheless, Citadel takes a significant step towards enabling secure and private collaborative machine learning.

actions