five

proteinNet3D

收藏
DataCite Commons2026-02-23 更新2026-05-05 收录
下载链接:
https://rodare.hzdr.de/record/4515
下载链接
链接失效反馈
官方服务:
资源简介:
ProteinNet3D is a curated large-scale dataset of 3D macromolecular density volumes designed to support representation learning and benchmarking in structural biology. The dataset is derived from the publicly available Electron Microscopy Data Bank (EMDB), a comprehensive repository of experimentally determined cryo-electron microscopy (cryo-EM) maps spanning diverse macromolecules, molecular assemblies, and subcellular structures. ProteinNet3D focuses specifically on individual macromolecules resolved by single-particle analysis (SPA) or subtomogram averaging (STA), ensuring methodological consistency across samples. To emphasize biologically meaningful structures while avoiding extreme cases, entries were restricted to a molecular weight range of 100–1500 kDa. This criterion excludes small domains and excessively large complexes, resulting in a dataset well-suited for learning size-robust structural representations. All volumes are standardized through isotropic resampling, spatial normalization to a fixed grid (64³ voxels), and intensity normalization to zero mean and unit variance. Background regions are masked using annotated contour levels to reduce noise contributions. To enhance diversity and rotational invariance, each structure is augmented with multiple random 3D rotations. Overall, ProteinNet3D comprises 26,110 processed samples and captures substantial structural heterogeneity, experimental variability, and realistic noise characteristics, making it a rigorous benchmark for 3D deep learning in cryo-EM.
提供机构:
Rodare
创建时间:
2026-02-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作