five

Metapath-based heterogeneous bioinformatics network dataset

收藏
Mendeley Data2022-05-08 更新2024-06-29 收录
下载链接:
https://www.doi.org/10.57760/sciencedb.01726
下载链接
链接失效反馈
官方服务:
资源简介:
The heterogeneous bioinformatics network contains 708 drugs, 1512 proteins, 5603 diseases and 4192 side effects. The nodes in the network represent four entities: drug (0), protein (1), disease (2), and side effect (3), respectively. The edges represent drug-drug association [DrugBank (Version 3.0)], drug-protein association [DrugBank (Version 3.0)], drug-disease association (Comparative Toxicogenomics Database), drug-side effect association [sider database (version 2)], protein-protein association [HPRD database (Release 9)], and protein-disease association (Comparative Toxicogenomics Database), respectively. The constructed metapath instance dataset can be used for drug-target interaction prediction and drug repositioning research, which consists of seven metapath types: drug-drug, drug-protein-drug, drug-disease-drug, drug-side effect-drug, protein-protein, protein-drug-protein, protein-disease-protein. The data processing flow is as follows:We first use Jaccard similarity coefficient to calculate the similarity of each drug-drug pair through three association networks of drug-drug, drug-disease and drug-side effect, and obtain the similarity matrices based on interaction, disease and side effect respectively. We set thresholds to combine the similarity matrix of drug chemical structure (drug chemical structure similarities > 0.5), and integrate drug-drug similarities (interaction-related drug similarities > 0.5, disease-related drug similarities > 0.5 and side effect-related drug similarities > 0.5) and drug-drug interaction, so as to obtain drug-drug association. Similarly, protein-protein and protein-disease association networks are used to calculate the similarity of each pair of protein-protein to obtain the similarity matrices based on interaction and disease respectively. After combining the similarity matrix of protein sequence (protein sequence similarities > 0.5), we set thresholds to integrate protein-protein similarities (interaction-related protein similarities > 0.5, disease-related protein similarities > 0.8) and protein-protein interaction to obtain protein-protein association. In this way, drug-drug similarity information and protein-protein similarity information are added to the constructed bioinformatics network. Finally, we get the metapath-based bioinformatics network dataset by establishing the association dictionaries.
创建时间:
2022-05-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作