five

Node connectivity measurements for Hetionet v1.0 metapaths

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/1435833
下载链接
链接失效反馈
官方服务:
资源简介:
Hetionet v1.0 is a hetnet (heterogeneous network) with 47,031 nodes of 11 types and 2,250,197 relationships of 24 types. This record contains computed connectivity measurements for Hetionet v1.0 for all metapaths (types of paths) up to length 3. These measurements are designed to assess the extent of connectivity between two nodes along a given metapath. Three types of data are included: Path counts: Path counts measure the number of paths from a source node to a target node along a specified metapath. The path count is a special case of the degree-weighted path count (DWPC) metric where the damping exponent parameter is set to 0.0. Path counts for all source–target node combinations of a given metapath are stored in a matrix with source nodes as rows and target nodes as columns. Degree-weighted path counts: DWPCs measure the abundance of paths from a source to target node along a given metapath (like path counts), but are adjusted for the degrees along the path such that paths through higher degree nodes are downweighted according to a damping parameter. The DWPCs here use a damping exponent of 0.5 and the same matrix serialization as the path count datasets. The values are not scaled/transformed. To compare to the null DWPCs discussed below, divide each value by the mean DWPC for the entire matrix and apply an inverse hyperbolic sine transformation. Degree-grouped permutation summaries: Degree-grouped permutations (DGP) are used to compute the significance of DWPC values. Specifically, they are used to estimate null distribution for DWPCs from the unpermuted hetnet. DGP summaries provide summary statistics of DWPCs computed on permuted hetnets. The permuted hetnets are derived from Hetionet v1.0 using the XSwap algorithm. This approach preserves node degree but randomizes edges to muddle their meaning. DWPCs were computed for 200 permuted networks and grouped by source–target node degree within each metapath. Permuted DWPCs were scaled by dividing by the unpermuted DWPC mean and then inverse hyperbolic sine transformed. Every degree pair for a given metapath has corresponding statistics that summarize its values across permuted hetnets. These statistics include the number of observed DWPCs, the number of nonzero DWPCs, the sum of the DWPCs, and the sum of squared DWPCs. These values are sufficient to calculate the parameters of a gamma-hurdle null DWPC distribution. Data Format: the .zip files are HetMat archive files. This simply means that the directory structure and file formats of the archived files conform to the HetMat data structure for storing hetnets on disk. Matrices are stored as scipy.sparse .npz files. .npz is a numpy array serialization format that scipy uses to write sparse matrices to disk. TSV files in this upload report information on the contents of the archives. The .zip-info.tsv files contain a list of all files included in the zip archives. metapath-dwpc-stats.tsv contains summary information on the unpermuted path counts and DWPCs. Note that results are archived by path length, such that all metapaths of length 1 are in a different archive than metapaths of length 2. Therefore, users who only need results for shorter metapaths, do not need to download the large archives for longer metapaths. There are 24 metapaths of length 1, 242 metapaths of length 2, and 1939 metapaths of length 3. Source code: These datasets were computed by the bulk.ipynb notebook from greenelab/hetmech@34e95b9. Funding: This work was supported through a research collaboration with Pfizer Worldwide Research and Development. This work is funded in part by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grants GBMF4552 and GBMF4560. More information: See the manuscript titled Hetnet connectivity search provides rapid insights into how two biomedical entities are related.

Hetionet v1.0是一款异质网络(heterogeneous network,简称hetnet),包含11种类型共计47031个节点,以及24种类型共计2250197条关系。本数据集提供了针对Hetionet v1.0的计算连通性度量结果,覆盖所有长度不超过3的元路径(metapath,即路径类型)。此类度量用于评估两个节点沿指定元路径的连通程度。本次收录的数据分为三类: 1. 路径计数(Path counts):路径计数用于衡量沿指定元路径从源节点至目标节点的路径总数。路径计数是度加权路径计数(degree-weighted path count, DWPC)指标的特殊场景,此时阻尼指数(damping exponent)参数被设置为0.0。指定元路径下所有源-目标节点组合的路径计数,将以源节点为行、目标节点为列的矩阵形式存储。 2. 度加权路径计数(Degree-weighted path counts, DWPC):DWPC用于衡量沿指定元路径从源节点到目标节点的路径丰度(与路径计数逻辑类似),但会根据路径上的节点度进行调整,使经过更高连接度节点的路径按照阻尼参数进行降权。本数据集采用阻尼指数为0.5的DWPC计算方式,且矩阵序列化格式与路径计数数据集保持一致。所有数值未经过缩放或转换。若需与下文提及的零假设DWPC进行对比,需将每个值除以整个矩阵的平均DWPC,并执行反双曲正弦变换。 3. 度分组置换统计量(Degree-grouped permutation summaries):度分组置换(degree-grouped permutation, DGP)用于计算DWPC值的显著性,具体可用于估计原始异构图中DWPC的零分布。DGP统计量提供了在置换异构图上计算得到的DWPC的汇总指标。置换异构图通过XSwap算法基于Hetionet v1.0生成,该方法可保留节点度的同时随机化边以打乱其原有关联关系。我们对200个置换后的网络计算了DWPC,并按每个元路径下的源-目标节点度进行分组。置换后的DWPC通过除以原始DWPC的均值完成缩放,随后进行反双曲正弦变换。针对指定元路径的每一对度组合,均有对应的统计量汇总其在置换异构图上的DWPC值,此类统计量包括观测DWPC的总数、非零DWPC的数量、DWPC的总和以及DWPC的平方和。上述数值足够用于计算伽马障碍(gamma-hurdle)零假设DWPC分布的参数。 数据格式:本次提供的压缩包均为HetMat归档文件,即归档文件的目录结构与文件格式符合用于在磁盘存储异构图的HetMat数据规范。矩阵以scipy.sparse .npz格式存储,.npz是numpy数组序列化格式,由scipy库用于将稀疏矩阵写入磁盘。 TSV文件说明:本次上传的TSV文件用于说明归档包的内容信息。.zip-info.tsv文件包含各压缩归档中包含的所有文件列表。metapath-dwpc-stats.tsv则包含原始路径计数与DWPC的汇总信息。请注意,结果按照路径长度进行归档:所有长度为1的元路径将被归档至单独的压缩包,与长度为2的元路径归档相互分离。因此,仅需较短元路径结果的用户无需下载针对长元路径的大型归档。本数据集包含24条长度为1的元路径、242条长度为2的元路径以及1939条长度为3的元路径。 源代码:本数据集通过greenelab/hetmech@34e95b9仓库中的bulk.ipynb笔记本计算生成。 资助信息:本研究通过与辉瑞全球研发(Pfizer Worldwide Research and Development)的合作研究项目获得支持,同时部分资金由戈登和贝蒂·摩尔基金会的数据驱动发现倡议通过GBMF4552与GBMF4560两项拨款提供。 更多信息:更多详情请参阅题为《Hetnet连通性搜索可快速揭示两种生物医学实体的关联方式》的学术论文。
创建时间:
2020-08-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作