five

PheKnowLator Builds -- CERLIB Challenge

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10052202
下载链接
链接失效反馈
官方服务:
资源简介:
PHENOTYPE KNOWLEDGE TRANSLATOR (PHEKNOWLATOR) 2021 Continuous Evaluation of Relational Learning in Biomedicine (CERLIB) Website: https://github.com/callahantiff/PheKnowLator/wiki Preprint: https://doi.org/10.48550/arXiv.2307.05727   OVERVIEW Introduction   Knowledge Graph Builds Challenge Data Challenge Relations Updates     INTRODUCTION PheKnowLator (Phenotype Knowledge Translator), is a Python 3 library that constructs semantically-rich, large-scale biomedical knowledge graphs under different semantic models. PheKnowLator is also a data sharing hub, providing downloadable versions of prebuilt knowledge graphs. For this challenge, the PheKnowLator knowledge graphs have been designed to model mechanisms of human disease and were built using 12 open biomedical ontologies, 24 linked open datasets, and results from two large-scale, experimentally-derived datasets. For additional information see the associated GitHub website: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0. For a visual representation of the resources used (and their relationships) in the PheKnowLator knowledge graphs, click the link below. Knowledge Representation: https://user-images.githubusercontent.com/8030363/103718875-ffc04480-4f84-11eb-8852-65e557ba772b.jpg     KNOWLEDGE GRAPH BUILDS PheKnowLator was designed to generate knowledge graphs under different semantic models and to provide users with complete flexibility throughout the construction process. At its core, PheKnowLator is built on a core set of Open Biomedical Ontologies (OBOs), which are extended with external data sources by utilizing different knowledge models. The software allows users the flexibility to customize the following parameters: Construction Approach: The semantic model utilized when integrating ontology and non-ontology data. The two available models are instance and subclass (details here: https://bit.ly/3p0ZNgg). We are providing an instance-based build for the challenge. Relations: A single relation can be added (relations_only) or each relation and its inverse can be added (inverse_relations). We are providing a knowledge graph built with inverse relations for the challenge. OWL Decoding: An OWL-decoded version of the full semantic knowledge graph. The method that we currently provide is called OWL-NETS (details here: https://bit.ly/35XCP2g), which decodes all triples needed to support OWL expressivity, but which alone are not biologically meaningful.     CHALLENGE DATA With this information in mind, the Google Cloud Storage Bucket includes the data files listed below. Additional information for each file type can be found here: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0#knowledge-graph-output. Data for all builds can be found here: https://zenodo.org/record/8173020.   Knowledge Graph Data PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_OWLNETS.nt PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_OWLNETS_NetworkxMultiDiGraph.gpickle Edge Lists PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_Triples_Identifiers.txt PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_Triples_Integers.txt PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_Triples_Integer_Identifier_Map.json Metadata Node and Relation Metadata including labels, synonyms, and definitions. Additional information can be found here:   PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_NodeLabels.txt node_metadata_dict.pkl     CHALLENGE RELATIONS We will evaluate predictions on 15 Relation Ontology (RO) relations utilized in 34 distinct edge types. Additional details on these edge types can be found here: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0#edge-data. The 15 RO relations and their associated edge types are shown in the table below. Participates in: chemical-pathway; gene-pathway; protein-gobp; protein-pathway Has function: pathway-gomf; protein-gomf  Located in: protein-anatomy; protein-cell; protein-gocc;  rna-anatomy; rna-cell Has component: pathway-gocc Has phenotype: disease-phenotype Has gene product: gene-protein Interacts with: chemical-gene; chemical-protein Genetically interacts with: gene-gene Molecularly interacts with: chemical-gobp; chemical-gocc; chemical-gomf; protein-catalyst; protein-cofactor; protein-protein Transcribed to: gene-rna Ribosomally translates to: rna-protein Causally influences: variant-gene Is substance that treats: chemical-disease; chemical-phenotype Causes or contributes to condition: gene-disease; gene-phenotype; variant-disease; variant-phenotype Realized in response to: gobp-pathway     BUILD UPDATES Below we note important updates to each build. For additional information on each build please see the project Wiki (https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0) and for more information on the data sources that are used for each build see: https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources.   JANUARY 2021  No data found for the pathway-gomf edge type, which was identified as being due to a change in the input data file, which is downloaded from Reactome APRIL 2021 Significant updates made to the workflow for building the graphs. This should have only a marginal impact on the resulting the knowledge graphs Changes made to the edge types from the table above. We are no longer supporting `chemical-rna` edge types The filtering applied to the input data when constructing the edges was updated to reduce potential variance in the quality of the resulting edges. Please see the descriptions for each data source (here: https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources) for addition information MAY 2021 pathway-gomf edges are back, Reactome appears to have resolved the errors we discovered in January 2021 disease-phenotype edge data may change slightly as the HPO changed the phenotype annotation file and associated formatting gene-gene edge data count has drastically decreased. The cause has been identified as changes that GeneMania has made to their data (change to origin data noted as 04/27/2021) JUNE 2021  Build successful. No issues to report   JULY 2021    Build successful. No issues to report   AUGUST 2021    Build successful. No issues to report   SEPTEMBER 2021    Build successful. No issues to report OCTOBER 2021   Build successful. No issues to report.   Note that some changes were made to the OWL-NETS algorithm, which may result in larger graphs than prior builds. Details on these changes can be found here: https://github.com/callahantiff/PheKnowLator/pull/109   The following issues (https://github.com/callahantiff/PheKnowLator/issues/116; https://github.com/callahantiff/PheKnowLator/issues/118) identified significant bugs, which needed immediate repair. As a result a new October build (v3.0.2) was generated on 10/19/21.     NOVEMBER 2021  Build successful. No issues to report
创建时间:
2023-10-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作