Identifying High-Priority Proteins Across the Human Diseasome Using Semantic Similarity
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/Identifying_High-Priority_Proteins_Across_the_Human_Diseasome_Using_Semantic_Similarity/7182968
下载链接
链接失效反馈官方服务:
资源简介:
Identifying
the genes and proteins associated with a biological
process or disease is a central goal of the biomedical research enterprise.
However, relatively few systematic approaches are available that provide
objective evaluation of the genes or proteins known to be important
to a research topic, and hence researchers often rely on subjective
evaluation of domain experts and laborious manual literature review.
Computational bibliometric analysis, in conjunction with text mining
and data curation, attempts to automate this process and return prioritized
proteins in any given research topic. We describe here a method to
identify and rank protein–topic relationships by calculating
the semantic similarity between a protein and a query term in the
biomerical literature while adjusting for the impact and immediacy
of associated research articles. We term the calculated metric the
weighted copublication distance (WCD) and show that it compares well
to related approaches in predicting benchmark protein lists in multiple
biological processes. We used WCD to extract prioritized “popular
proteins” across multiple cell types, subanatomical regions,
and standardized vocabularies containing over 20 000 human
disease terms. The collection of protein–disease associations
across the resulting human “diseasome” supports data
analytical workflows to perform reverse protein-to-disease queries
and functional annotation of experimental protein lists. We envision
that the described improvement to the popular proteins strategy will
be useful for annotating protein lists and guiding method development
efforts as well as generating new hypotheses on understudied disease
proteins using bibliometric information.
创建时间:
2018-10-09



