five

Curated eutherian third party data gene data sets

收藏
NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA453891
下载链接
链接失效反馈
官方服务:
资源简介:
Public eutherian reference genomic sequence data sets ushered in new era in biological and medical sciences. For example, among others, one major aim of initial sequencing and analysis of human genome was to update and revise human gene data sets, and uncover potential new drugs, drug targets, and molecular markers in medical diagnostics. In addition, the most recent research projects in scientific field of eutherian comparative genomics included intentions to sequence every extant eutherian species genome in foreseeable future. Yet, due to potential genomic sequence errors and incompleteness of reference genomic sequences, future revisions and updates eutherian genomic sequence data sets were expected. For example, the potential genomic sequence errors included analytical and bioinformatical errors (erroneous gene annotations, genomic sequence misassemblies) and Sanger DNA sequencing errors (artefactual nucleotide deletions, insertions and substitutions). In addition, the human protein coding gene census remained unfinished. Whereas the human initial integrated gene index included about 32000 known and predicted protein coding genes, contemporary estimates included about 20000-21000 protein coding genes in human genome. Thus, under open science research project "Comparative genomic analysis of eutherian genes", the eutherian comparative genomic analysis protocol RRID:SCR_014401 was established as one framework of eutherian gene data set revisions. The protocol including gene annotations, phylogenetic analysis and protein molecular evolution analysis published 3 new tests: (1) test of reliability of public eutherian genomic sequences using genomic sequence redundancies, (2) test of contiguity of public eutherian genomic sequences using multiple pairwise genomic sequence alignments and (3) test of protein molecular evolution using relative synonymous codon usage statistics. The omnibus research project curated 14 eutherian gene data sets implicated in major physiological and pathological processes, including, in aggregate, 2615 published complete coding sequences that were deposited in European Nucleotide Archive as third party data gene data sets.
创建时间:
2018-04-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作