five

"HiHPO dataset"

收藏
DataCite Commons2026-04-02 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/hihpo-dataset-0
下载链接
链接失效反馈
官方服务:
资源简介:
"The dataset used in our article \"HiHPO: Multimodal Hierarchical Graph Learning for Predicting Missing Protein-Phenotype Associations\" supports predicting missing protein-phenotype associations under random and temporal splits. It includes the Human Phenotype Ontology (HPO, hp.obo) and human gene-HPO associations downloaded from the HPO project website (January 16, 2024 release), with gene IDs mapped to UniProt protein IDs. The true-path-rule was applied to propagate annotations across the HPO hierarchy. The Phenotypic Abnormality (PA) sub-ontology was selected, comprising 4,919 proteins and 10,937 HPO terms. Multi-modal protein data include a PPI network from STRING v12 (~170k interactions), tissue-specific gene expression profiles from GTEx (E-MTAB-5214), and protein sequences from UniProt with ESM\u20112 3B layer\u201136 embeddings. For random split, 1,000 proteins were sampled with 20% of their annotations for testing. The temporal split uses annotations before January 16, 2024, for training and new annotations (January 16, 2024\u2013January 16, 2025) for testing."
提供机构:
IEEE DataPort
创建时间:
2026-04-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作