five

MDSplusML Project Progress and Revised Plan

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://doi.org/10.7910/DVN/THJF4X
下载链接
链接失效反馈
官方服务:
资源简介:
The MDSplusML project set out to modernize fusion-experiment data access by improving performance, usability, and compliance with FAIR (Findable, Accessible, Interoperable, Reusable) data principles. Initial benchmarks on our on-premises systems compared the legacy distributed-client, thin-client, and direct HDF5/HSDS access methods using a representative machine-learning workload of ten thousand shots. We discovered that network transaction latency—not expression-evaluation complexity—dominated data retrieval times. Adopting the thin-client protocol reduced a multi-hour bulk-download to tens of minutes, and raw HDF5 reads matched local-disk speeds, while HSDS underperformed, motivating ongoing optimization. Guided by these findings, we have refined our roadmap. On-site users will employ enhanced thin-client APIs (getMany, getManyMany) to batch requests efficiently. For cloud distribution, we propose a “frozen-signals” service: precomputed, read-only snapshots of user-tagged data that prune unnecessary nodes to control file size and eliminate run-time evaluation overhead. Prototypes in Python and C demonstrate that, once downloaded from S3, analyses on frozen HDF5 proceed at native speed, irrespective of network latency. Finally, to support cross-machine collaboration, we are aligning our curated datasets with IMAS standards and evaluating repository platforms such as InvenioRDM for scalable, FAIR-compliant archiving. These efforts establish a high-performance, user-centered MDSplus ecosystem that meets both current and emerging needs in fusion data science.
创建时间:
2025-09-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作