MOESM1 of Identification of infectious disease-associated host genes using machine learning techniques
收藏Figshare2019-12-27 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/MOESM1_of_Identification_of_infectious_disease-associated_host_genes_using_machine_learning_techniques/11470902
下载链接
链接失效反馈官方服务:
资源简介:
Additional file 1: Table S1. All the curated infectious diseases-associated human genes from DisGeNET. Table S2. All the mapped gene name to uniprot id using mapping table of DisGeNET. Table S3. Positive dataset for 10-fold cross-validation. Table S4. Positive blind dataset (not used in training or testing of 10-fold cross-validation techniques for developing the prediction model). Table S5. All the disease-associated human reviewed proteins in DisGeNET. Table S6. All the reviewed human proteins collected from UniProtKB dated 12/01/2018. Table S7. All the reviewed human proteins not associated with any diseases. Table S8. Negative dataset for 10-fold cross-validation. Table S9. Negative blind dataset (not used in training or testing of 10-fold cross-validation techniques for developing the prediction model). Table S10. Independent dataset (Befree text mining genes from DisGeNET associated with infectious diseases). Table S11. All human protein-protein interactions (PPIs) from Human Protein Reference Database (HPRD) (Release 9). Table S12. All unique human in HPRD (Release 9). Table S13. All the mapped human protein-protein interactions (PPIs) in uniprot id format. Table S14. All the mapped unique human proteins in uniprot. Table S15. 9 topological properties of protein-protein interaction networks using HPRD PPIs dataset. Table S16. Features wise performance measures on disease and non-disease associated proteins dataset using deep neural network classifier. Table S17. Mixed features based performance on disease and non-disease associated proteins dataset. Table S18. 10 selected features for normalized and filtered PAAC and Network properties. Table S19. 16 selected features for PAAC and Network properties. Table S20. Selected features wise performance measures using different classifier. Table S21. Prediction result on independent dataset. Table S22. Top 100 proteins (genes) are predicted by our proposed DNN based method. Table S23. Significantly enriched disease-ontology terms for top 100 proteins (genes) based on Genetic Association Database (GAD). Table S24. Significantly enriched gene-ontology biological process terms for top 100 proteins (genes).
附加文件1:表S1 来自DisGeNET数据库的全部经人工整理的传染病相关人类基因;表S2 基于DisGeNET映射表得到的全部基因名与UniProt标识符的对应关系;表S3 10折交叉验证用正样本数据集;表S4 正样本盲测数据集(未用于开发预测模型的10折交叉验证训练与测试环节);表S5 DisGeNET数据库中全部与疾病相关的经评审人类蛋白质;表S6 2018年12月1日从UniProtKB(通用蛋白质知识库)获取的全部已评审人类蛋白质;表S7 全部未与任何疾病相关的已评审人类蛋白质;表S8 10折交叉验证用负样本数据集;表S9 负样本盲测数据集(未用于开发预测模型的10折交叉验证训练与测试环节);表S10 独立测试数据集(基于Befree文本挖掘得到的DisGeNET数据库中与传染病相关的基因);表S11 来自人类蛋白质参考数据库(Human Protein Reference Database, HPRD)第9版的全部人类蛋白质-蛋白质相互作用(PPI)数据;表S12 HPRD第9版中的全部唯一人类蛋白质;表S13 全部以UniProt标识符格式映射得到的人类蛋白质-蛋白质相互作用数据;表S14 全部以UniProt格式映射得到的唯一人类蛋白质;表S15 基于HPRD的PPI数据集得到的蛋白质相互作用网络的9种拓扑性质;表S16 基于深度神经网络分类器,在疾病相关与非疾病相关蛋白质数据集上得到的单特征性能评估结果;表S17 在疾病相关与非疾病相关蛋白质数据集上,基于混合特征得到的性能评估结果;表S18 经归一化与滤波处理的伪氨基酸组成(Pseudo Amino Acid Composition, PAAC)及网络性质的10个筛选特征;表S19 伪氨基酸组成及网络性质的16个筛选特征;表S20 基于不同分类器,在筛选特征上得到的性能评估结果;表S21 独立测试数据集上的预测结果;表S22 基于本文提出的深度神经网络(Deep Neural Network, DNN)方法预测得到的前100位蛋白质(基因);表S23 基于遗传关联数据库(Genetic Association Database, GAD),对前100位蛋白质(基因)得到的显著富集疾病本体术语;表S24 对前100位蛋白质(基因)得到的显著富集基因本体生物过程术语。
创建时间:
2019-12-27



