Data Archiving and Access for NaFM: Pre-training a Foundation Model for Small-Molecule Natural Products
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Data_Archiving_and_Access_for_NaFM_Pre-training_a_Foundation_Model_for_Small-Molecule_Natural_Products/28980254
下载链接
链接失效反馈官方服务:
资源简介:
pretrain_smiles.pkl: Preprocessed data used for model pretraining. The original data was obtained from the COCONUT database: https://coconut.naturalproducts.net/classification_data.csv: Data prepared for the Natural Product Taxonomy Classification experiment. The original dataset was sourced from the following archive: https://zenodo.org/records/5068687#.YOKJQOgzaUlNPClassifier_dataset_refreshed.csv: Data curated for direct comparison with NPClassifier. The original data is available at: https://github.com/mwang87/NP-Classifier/tree/master/training/Data/NPClassifier_dataset.xlsxregression_data.csv: Dataset used for natural product bioactivity prediction tasks. The original data was retrieved from the NPASS database: https://bidd.group/NPASS/lotus_data.csv: Data prepared for biological source prediction and related mining tasks. The source data was collected from the LOTUS database: https://lotus.naturalproducts.net/bgc_data.csv: Dataset constructed for biosynthetic gene cluster mining. The original sources include the MIBiG database (https://mibig.secondarymetabolites.org/) and Pfam (http://pfam.xfam.org/)external_data.csv: Dataset used for bioactivity screening of natural products. The original data was obtained from the NPASS database: https://bidd.group/NPASS/
创建时间:
2025-05-09



