five

Data Archiving and Access for NaFM: Pre-training a Foundation Model for Small-Molecule Natural Products

收藏
Figshare2025-05-09 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Data_Archiving_and_Access_for_NaFM_Pre-training_a_Foundation_Model_for_Small-Molecule_Natural_Products/28980254/1
下载链接
链接失效反馈
官方服务:
资源简介:
<b>pretrain_smiles.pkl</b>: Preprocessed data used for model pretraining. The original data was obtained from the COCONUT database: https://coconut.naturalproducts.net/<b>classification_data.csv</b>: Data prepared for the Natural Product Taxonomy Classification experiment. The original dataset was sourced from the following archive: https://zenodo.org/records/5068687#.YOKJQOgzaUl<b>NPClassifier_dataset_refreshed.csv</b>: Data curated for direct comparison with NPClassifier. The original data is available at: https://github.com/mwang87/NP-Classifier/tree/master/training/Data/NPClassifier_dataset.xlsx<b>regression_data.csv</b>: Dataset used for natural product bioactivity prediction tasks. The original data was retrieved from the NPASS database: https://bidd.group/NPASS/<b>lotus_data.csv</b>: Data prepared for biological source prediction and related mining tasks. The source data was collected from the LOTUS database: https://lotus.naturalproducts.net/<b>bgc_data.csv</b>: Dataset constructed for biosynthetic gene cluster mining. The original sources include the MIBiG database (https://mibig.secondarymetabolites.org/) and Pfam (http://pfam.xfam.org/)<b>external_data.csv</b>: Dataset used for bioactivity screening of natural products. The original data was obtained from the NPASS database: https://bidd.group/NPASS/
提供机构:
Ding, Yuheng
创建时间:
2025-05-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作