Ontology based text mining of gene-phenotype associations: application to candidate gene prediction
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/2532613
下载链接
链接失效反馈官方服务:
资源简介:
Gene-phenotype associations play an important role in understanding
the disease mechanisms which is a requirement for treatment
development. A portion of gene-phenotype associations are observed
mainly experimentally and made publicly available through several
standard resources such as MGI. However, there is still a vast
amount of gene--phenotype associations buried in the biomedical
literature. Given the large amount of literature data, we need
automated text mining tools to alleviate the burden in manual
curation of gene-phenotype associations and to develop
comprehensive resources. We developed an ontology based
approach in combination with statistical methods to text mine
gene-phenotype associations from literature. Our method achieved
AUC values of 0.90 and 0.75 in recovering known gene-phenotype
associations from HPO and MGI respectively. We posit that candidate
genes and their relevant diseases should be expressed with similar
phenotypes in publications. Thus, we demonstrate the utility of our
approach by predicting disease candidate genes based on the semantic
similarities of phenotypes associated with genes and diseases. We evaluated our disease candidate prediction model on
the gene-disease associations from MGI. Our model achieved AUC
values of 0.90 and 0.87 on OMIM (human) and MGI (mouse) datasets of
gene-disease associations respectively. Our manual analysis on the
text mined data revealed that, our method can accurately extract
gene-phenotype associations which are not currently covered by the
existing public gene-phenotype resources. Overall, results indicate
that our method can precisely extract known as well as new
gene-phenotype associations from literature. This released dataset at Zenodo covers our gene-phenotype extracts from the literature. All the methods used to extract the data are available at https://github.com/bio-ontology-research-group/genepheno.
创建时间:
2020-01-24



