five

Lost in translation: the pitfalls of Ensembl gene annotations between human genome assemblies and their impact on diagnostics

收藏
DataCite Commons2023-08-29 更新2024-08-18 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Lost_in_translation_the_pitfalls_of_Ensembl_gene_annotations_between_human_genome_assemblies_and_their_impact_on_diagnostics/23709768/1
下载链接
链接失效反馈
官方服务:
资源简介:
Gene models based on GRCh37 human genome assembly are preferred by many international projects over other updated assemblies (GRCh38 and T2T). Discrepant genes (DGs), those recognized as protein coding in the new but not the old assembly, are ignored by several genomic resources and discarded by variant prioritization tools relying on information based on GRCh37. We curated a set of Ensembl genes with discrepant annotations between GRCh37 and GRCh38, additionally matching their RefSeq transcripts. Furthermore, we examined their clinical and phenotypic relevance. A total of 337 genes were reclassified as ‘protein-coding’ in GRCh38 but not in GRCh37, with 194 having a discrepant HGNC gene symbol. Many remain missing from the current known RefSeq gene models (<i>N</i> = 73). We found many clinically relevant genes in this group of neglected genes, and we anticipate that many more will be found relevant in the future. Important additional annotations such as evolutionary constraint metrics are also not calculated for these genes, further relegating them into oblivion. For discrepant genes, the inaccurate label of ‘non-protein-coding’ has relevant ramifications on clinical genetics. Accurate collation of these genes allows for manual curation in clinically relevant scenarios.
提供机构:
Taylor & Francis
创建时间:
2023-07-19
二维码
社区交流群
二维码
科研交流群
商业服务