Supporting data for "CryoDataBot: a pipeline to curate cryoEM datasets for AI-driven structural biology"
收藏DataCite Commons2025-10-14 更新2026-05-03 收录
下载链接:
http://gigadb.org/dataset/102765
下载链接
链接失效反馈官方服务:
资源简介:
CryoDataBot was developed to address the growing need for reproducible, customizable cryoEM datasets in AI-driven structural biology. As deep learning models become increasingly central to atomic structure modeling, researchers face significant challenges in assembling high-quality training data. Existing workflows are often manual, inconsistent, and lack standardized quality control, resulting in time-consuming data preparation and variable dataset quality. CryoDataBot aims to fill this gap by providing an automated, modular pipeline that simplifies dataset construction and supports diverse modeling objectives.<br>The pipeline integrates key steps such as cryoEM map retrieval, voxel resampling, structural labeling, and subvolume partitioning. It also offers fine-grained quality control features, including residue-level similarity filtering, Q-score thresholds, and mapmodel fitness evaluation (MMF), which together help eliminate redundant or low-quality entries. Users can configure these parameters through a graphical interface to generate task-specific datasets, while intermediate outputs are retained to support reproducibility and debugging.<br>This workflow enables researchers to efficiently construct high-quality datasets tailored to the specific requirements of their AI models. It supports training and fine-tuning of deep learning architectures, and provides a reproducible framework for building task-specific cryoEM datasets. By offering flexible resolution support and standardized formatting, CryoDataBot helps streamline model development and promotes consistent data preparation practices in AI-driven structural biology.
为应对人工智能驱动的结构生物学领域对可复现、可定制化冷冻电镜(cryoEM)数据集日益增长的需求,CryoDataBot应运而生。随着深度学习模型在原子结构建模中的核心地位日益凸显,研究人员在构建高质量训练数据集时面临诸多严峻挑战。现有工作流程往往依赖人工操作、一致性欠佳且缺乏标准化质量管控,导致数据准备耗时冗长,数据集质量参差不齐。CryoDataBot旨在填补这一空白,通过提供一套自动化、模块化的工作流,简化数据集构建流程并适配多样化的建模目标。<br>该工作流集成了冷冻电镜密度图检索、体素重采样、结构标注与子体积分割等核心步骤。同时还提供细粒度质量管控功能,涵盖残基水平相似性过滤、Q分数阈值设置以及模型-密度图适配性评估(mapmodel fitness evaluation, MMF),可协同剔除冗余或低质量数据条目。研究人员可通过图形化界面配置上述参数,生成适配特定任务的数据集;同时保留中间输出结果,以保障实验可复现性并支持调试工作。<br>该工作流可帮助研究人员高效构建适配其人工智能模型特定需求的高质量数据集。其支持深度学习架构的训练与微调,并为构建任务专属的冷冻电镜数据集提供了可复现的框架。通过支持灵活的分辨率设置与标准化格式输出,CryoDataBot可助力简化模型开发流程,并推动人工智能驱动的结构生物学领域实现统一规范的数据制备实践。
提供机构:
GigaScience Database
创建时间:
2025-10-14



