PheKnowLator Builds -- CERLIB Challenge
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10052202
下载链接
链接失效反馈官方服务:
资源简介:
PHENOTYPE KNOWLEDGE TRANSLATOR (PHEKNOWLATOR)
2021 Continuous Evaluation of Relational Learning in Biomedicine (CERLIB)
Website: https://github.com/callahantiff/PheKnowLator/wiki
Preprint: https://doi.org/10.48550/arXiv.2307.05727
OVERVIEW
Introduction
Knowledge Graph Builds
Challenge Data
Challenge Relations
Updates
INTRODUCTION
PheKnowLator (Phenotype Knowledge Translator), is a Python 3 library that constructs semantically-rich, large-scale biomedical knowledge graphs under different semantic models. PheKnowLator is also a data sharing hub, providing downloadable versions of prebuilt knowledge graphs. For this challenge, the PheKnowLator knowledge graphs have been designed to model mechanisms of human disease and were built using 12 open biomedical ontologies, 24 linked open datasets, and results from two large-scale, experimentally-derived datasets. For additional information see the associated GitHub website: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0. For a visual representation of the resources used (and their relationships) in the PheKnowLator knowledge graphs, click the link below.
Knowledge Representation: https://user-images.githubusercontent.com/8030363/103718875-ffc04480-4f84-11eb-8852-65e557ba772b.jpg
KNOWLEDGE GRAPH BUILDS
PheKnowLator was designed to generate knowledge graphs under different semantic models and to provide users with complete flexibility throughout the construction process. At its core, PheKnowLator is built on a core set of Open Biomedical Ontologies (OBOs), which are extended with external data sources by utilizing different knowledge models. The software allows users the flexibility to customize the following parameters:
Construction Approach: The semantic model utilized when integrating ontology and non-ontology data. The two available models are instance and subclass (details here: https://bit.ly/3p0ZNgg). We are providing an instance-based build for the challenge.
Relations: A single relation can be added (relations_only) or each relation and its inverse can be added (inverse_relations). We are providing a knowledge graph built with inverse relations for the challenge.
OWL Decoding: An OWL-decoded version of the full semantic knowledge graph. The method that we currently provide is called OWL-NETS (details here: https://bit.ly/35XCP2g), which decodes all triples needed to support OWL expressivity, but which alone are not biologically meaningful.
CHALLENGE DATA
With this information in mind, the Google Cloud Storage Bucket includes the data files listed below. Additional information for each file type can be found here: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0#knowledge-graph-output.
Data for all builds can be found here: https://zenodo.org/record/8173020.
Knowledge Graph Data
PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_OWLNETS.nt
PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_OWLNETS_NetworkxMultiDiGraph.gpickle
Edge Lists
PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_Triples_Identifiers.txt
PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_Triples_Integers.txt
PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_Triples_Integer_Identifier_Map.json
Metadata
Node and Relation Metadata including labels, synonyms, and definitions. Additional information can be found here:
PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_NodeLabels.txt
node_metadata_dict.pkl
CHALLENGE RELATIONS
We will evaluate predictions on 15 Relation Ontology (RO) relations utilized in 34 distinct edge types. Additional details on these edge types can be found here: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0#edge-data. The 15 RO relations and their associated edge types are shown in the table below.
Participates in: chemical-pathway; gene-pathway; protein-gobp; protein-pathway
Has function: pathway-gomf; protein-gomf
Located in: protein-anatomy; protein-cell; protein-gocc; rna-anatomy; rna-cell
Has component: pathway-gocc
Has phenotype: disease-phenotype
Has gene product: gene-protein
Interacts with: chemical-gene; chemical-protein
Genetically interacts with: gene-gene
Molecularly interacts with: chemical-gobp; chemical-gocc; chemical-gomf; protein-catalyst; protein-cofactor; protein-protein
Transcribed to: gene-rna
Ribosomally translates to: rna-protein
Causally influences: variant-gene
Is substance that treats: chemical-disease; chemical-phenotype
Causes or contributes to condition: gene-disease; gene-phenotype; variant-disease; variant-phenotype
Realized in response to: gobp-pathway
BUILD UPDATES
Below we note important updates to each build. For additional information on each build please see the project Wiki (https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0) and for more information on the data sources that are used for each build see: https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources.
JANUARY 2021
No data found for the pathway-gomf edge type, which was identified as being due to a change in the input data file, which is downloaded from Reactome
APRIL 2021
Significant updates made to the workflow for building the graphs. This should have only a marginal impact on the resulting the knowledge graphs
Changes made to the edge types from the table above. We are no longer supporting `chemical-rna` edge types
The filtering applied to the input data when constructing the edges was updated to reduce potential variance in the quality of the resulting edges. Please see the descriptions for each data source (here: https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources) for addition information
MAY 2021
pathway-gomf edges are back, Reactome appears to have resolved the errors we discovered in January 2021
disease-phenotype edge data may change slightly as the HPO changed the phenotype annotation file and associated formatting
gene-gene edge data count has drastically decreased. The cause has been identified as changes that GeneMania has made to their data (change to origin data noted as 04/27/2021)
JUNE 2021
Build successful. No issues to report
JULY 2021
Build successful. No issues to report
AUGUST 2021
Build successful. No issues to report
SEPTEMBER 2021
Build successful. No issues to report
OCTOBER 2021
Build successful. No issues to report.
Note that some changes were made to the OWL-NETS algorithm, which may result in larger graphs than prior builds. Details on these changes can be found here: https://github.com/callahantiff/PheKnowLator/pull/109
The following issues (https://github.com/callahantiff/PheKnowLator/issues/116; https://github.com/callahantiff/PheKnowLator/issues/118) identified significant bugs, which needed immediate repair. As a result a new October build (v3.0.2) was generated on 10/19/21.
NOVEMBER 2021
Build successful. No issues to report
创建时间:
2023-10-30



