five

Supporting data for "HAMAP as SPARQL rules – A portable annotation pipeline for genomes and proteomes"

收藏
DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100683
下载链接
链接失效反馈
官方服务:
资源简介:
Genome and proteome annotation pipelines are generally custom built and therefore not easily reusable by other groups, which leads to duplication of effort, increased costs, and suboptimal results. One cost-effective way to increase the data quality in public databases is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. We have translated the rules of our HAMAP proteome annotation pipeline to queries in the W3C standard SPARQL 1.1 syntax and applied them with two off-the-shelf SPARQL engines to UniProtKB/Swiss-Prot protein sequences described in RDF format. This approach is applicable to any genome or proteome annotation pipeline and greatly simplifies their reuse. HAMAP SPARQL rules and documentation are freely available for download from the HAMAP FTP site ftp://ftp.expasy.org/databases/hamap/sparql/ under a CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license.<br> This supporting dataset contains the data that demonstrates that our new "HAMAP as SPARQL rules" approach yields the same results as our existing custom implementation of the HAMAP proteome annotation pipeline, as described in Figure 6 of the GigaScience publication The file swissprot_go_term_and_keyword_annotations.nq.gz contains an N-Quads representation of the GO term and UniProt keyword annotations generated with our custom HAMAP pipeline and with the new HAMAP SPARQL rules, in separate named graphs for comparison. To reproduce the generation of these annotations with the HAMAP SPARQL rules, one can use the release 2019_10 version of the rules (file hamap_sparql_2019_10_snapshot.tar.gz) and Swiss-Prot (ftp://ftp.uniprot.org/pub/databases/uniprot/previous_releases/release-2019_10/knowledgebase/uniprot_sprot-only2019_10.tar.gz).
提供机构:
GigaScience Database
创建时间:
2020-01-15
二维码
社区交流群
二维码
科研交流群
商业服务