Data Archiving and Access for NaFM: Pre-training a Foundation Model for Small-Molecule Natural Products

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/Data_Archiving_and_Access_for_NaFM_Pre-training_a_Foundation_Model_for_Small-Molecule_Natural_Products/28980254

下载链接

链接失效反馈

官方服务：

资源简介：

pretrain_smiles.pkl: Preprocessed data used for model pretraining. The original data was obtained from the COCONUT database: https://coconut.naturalproducts.net/classification_data.csv: Data prepared for the Natural Product Taxonomy Classification experiment. The original dataset was sourced from the following archive: https://zenodo.org/records/5068687#.YOKJQOgzaUlNPClassifier_dataset_refreshed.csv: Data curated for direct comparison with NPClassifier. The original data is available at: https://github.com/mwang87/NP-Classifier/tree/master/training/Data/NPClassifier_dataset.xlsxregression_data.csv: Dataset used for natural product bioactivity prediction tasks. The original data was retrieved from the NPASS database: https://bidd.group/NPASS/lotus_data.csv: Data prepared for biological source prediction and related mining tasks. The source data was collected from the LOTUS database: https://lotus.naturalproducts.net/bgc_data.csv: Dataset constructed for biosynthetic gene cluster mining. The original sources include the MIBiG database (https://mibig.secondarymetabolites.org/) and Pfam (http://pfam.xfam.org/)external_data.csv: Dataset used for bioactivity screening of natural products. The original data was obtained from the NPASS database: https://bidd.group/NPASS/

创建时间：

2025-05-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集