Data Archiving and Access for NaFM: Pre-training a Foundation Model for Small-Molecule Natural Products
收藏Figshare2025-05-09 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Data_Archiving_and_Access_for_NaFM_Pre-training_a_Foundation_Model_for_Small-Molecule_Natural_Products/28980254/1
下载链接
链接失效反馈官方服务:
资源简介:
<b>pretrain_smiles.pkl</b>: Preprocessed data used for model pretraining. The original data was obtained from the COCONUT database: https://coconut.naturalproducts.net/<b>classification_data.csv</b>: Data prepared for the Natural Product Taxonomy Classification experiment. The original dataset was sourced from the following archive: https://zenodo.org/records/5068687#.YOKJQOgzaUl<b>NPClassifier_dataset_refreshed.csv</b>: Data curated for direct comparison with NPClassifier. The original data is available at: https://github.com/mwang87/NP-Classifier/tree/master/training/Data/NPClassifier_dataset.xlsx<b>regression_data.csv</b>: Dataset used for natural product bioactivity prediction tasks. The original data was retrieved from the NPASS database: https://bidd.group/NPASS/<b>lotus_data.csv</b>: Data prepared for biological source prediction and related mining tasks. The source data was collected from the LOTUS database: https://lotus.naturalproducts.net/<b>bgc_data.csv</b>: Dataset constructed for biosynthetic gene cluster mining. The original sources include the MIBiG database (https://mibig.secondarymetabolites.org/) and Pfam (http://pfam.xfam.org/)<b>external_data.csv</b>: Dataset used for bioactivity screening of natural products. The original data was obtained from the NPASS database: https://bidd.group/NPASS/
提供机构:
Ding, Yuheng
创建时间:
2025-05-09



