five

HPC-native 5-Safe Trusted Research Environment for Scalable Medical Data Science

收藏
DataCite Commons2026-03-13 更新2026-05-03 收录
下载链接:
https://data.goettingen-research-online.de/citation?persistentId=doi:10.25625/NSLSG7
下载链接
链接失效反馈
官方服务:
资源简介:
The increasing digitalization of healthcare in Germany presents significant opportuni- ties for data-driven medical research, as more and more data is available to researchers. However, traditional data export models pose substantial security risks once the data leaves the clinical premises. As regularity pressure increases, with the European Health Data Space (EHDS) mandating secure processing environments for secondary data use, so-called Trusted Research Environments (TREs) have emerged as a solution. TREs represent a fundamental shift: instead of providing researchers with direct data downloads, they grant access to a highly secure remote computing environment. Here, sensitive data can be analyzed safely, and no data can be exported without manual review. Simultaneously, the adoption of machine learning in medical research demands significant computational power. Classical High-Performance Computing (HPC) environments, however, lack the security and isolation required for sensitive medical data. This thesis bridges that gap by presenting the design, implementation, and operation of a new HPC-native TRE. Researchers access the environment through a secure, browser-based Linux Virtual Desktop Infrastructure (VDI). Standard programming environments such as JupyterLab (Python) and RStudio are pre-installed, and the environment can be extended with additional software if needed. No internet access is available. The architecture is built upon the Five Safes framework and guided by a zero-trust security model, ensuring that sensitive data remains permanently and end-to-end encrypted. This thesis provides not only the technical implementation for an HPC-based TRE but also the organizational framework for implementing the Five Safes within the unique context of HPC using Germany’s medical data infrastructure. Technical key contributions for the Safe Settings environment include a streaming, client-side encrypting uploader for resumable terabyte-scale uploads, a purpose-built Key Management System (KMS) using an envelope encryption scheme that stores data protected, even from administrators, and an extended fork of the gocryptfs filesystem that enables per-file, multi-user cryptographic access control and a full audit-log. Organizational contributions include defined workflows for data ingestion (Safe Data, Safe Projects), outlines for user agreements and training curricula (Safe People), and output review that integrates directly with existing German Data Integration Centers (DIZ). Finally, this work provides an in-depth security analysis covering standard threat models and worstcase disaster scenarios. As the first HPC-native TREs in Germany and, to our knowledge, the first to implement a user-exclusive, end-to-end envelope encryption scheme, this work provides a valuable blueprint for the design of similar high-security research environments for computeintensive workloads. By centralizing the administrative and technical load, this TRE offers a more productive environment, equipping scientists with powerful computational resources and GPU access for more machine learning. This ultimately enables better medical data science, accelerating progress and enabling greater research.
提供机构:
GRO.data
创建时间:
2025-11-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作