"HiHPO dataset"
收藏DataCite Commons2026-04-02 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/hihpo-dataset-0
下载链接
链接失效反馈官方服务:
资源简介:
"The dataset used in our article \"HiHPO: Multimodal Hierarchical Graph Learning for Predicting Missing Protein-Phenotype Associations\" supports predicting missing protein-phenotype associations under random and temporal splits. It includes the Human Phenotype Ontology (HPO, hp.obo) and human gene-HPO associations downloaded from the HPO project website (January 16, 2024 release), with gene IDs mapped to UniProt protein IDs. The true-path-rule was applied to propagate annotations across the HPO hierarchy. The Phenotypic Abnormality (PA) sub-ontology was selected, comprising 4,919 proteins and 10,937 HPO terms. Multi-modal protein data include a PPI network from STRING v12 (~170k interactions), tissue-specific gene expression profiles from GTEx (E-MTAB-5214), and protein sequences from UniProt with ESM\u20112 3B layer\u201136 embeddings. For random split, 1,000 proteins were sampled with 20% of their annotations for testing. The temporal split uses annotations before January 16, 2024, for training and new annotations (January 16, 2024\u2013January 16, 2025) for testing."
提供机构:
IEEE DataPort
创建时间:
2026-04-02



