Curated eutherian third party data gene data sets
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA453891
下载链接
链接失效反馈官方服务:
资源简介:
Public eutherian reference genomic sequence data sets ushered in new era in biological and medical sciences. For example, among others, one major aim of initial sequencing and analysis of human genome was to update and revise human gene data sets, and uncover potential new drugs, drug targets, and molecular markers in medical diagnostics. In addition, the most recent research projects in scientific field of eutherian comparative genomics included intentions to sequence every extant eutherian species genome in foreseeable future. Yet, due to potential genomic sequence errors and incompleteness of reference genomic sequences, future revisions and updates eutherian genomic sequence data sets were expected. For example, the potential genomic sequence errors included analytical and bioinformatical errors (erroneous gene annotations, genomic sequence misassemblies) and Sanger DNA sequencing errors (artefactual nucleotide deletions, insertions and substitutions). In addition, the human protein coding gene census remained unfinished. Whereas the human initial integrated gene index included about 32000 known and predicted protein coding genes, contemporary estimates included about 20000-21000 protein coding genes in human genome. Thus, under open science research project "Comparative genomic analysis of eutherian genes", the eutherian comparative genomic analysis protocol RRID:SCR_014401 was established as one framework of eutherian gene data set revisions. The protocol including gene annotations, phylogenetic analysis and protein molecular evolution analysis published 3 new tests: (1) test of reliability of public eutherian genomic sequences using genomic sequence redundancies, (2) test of contiguity of public eutherian genomic sequences using multiple pairwise genomic sequence alignments and (3) test of protein molecular evolution using relative synonymous codon usage statistics. The omnibus research project curated 14 eutherian gene data sets implicated in major physiological and pathological processes, including, in aggregate, 2615 published complete coding sequences that were deposited in European Nucleotide Archive as third party data gene data sets.
创建时间:
2018-04-27



