five

Frameshifts and wild-type protein sequences are always highly similar because the genetic code is optimal for frameshift tolerance

收藏
Figshare2021-04-17 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Frameshifts_and_wild-type_protein_sequences_are_always_highly_similar_because_the_genetic_code_is_optimal_for_frameshift_tolerance/9948050/1
下载链接
链接失效反馈
官方服务:
资源简介:
Frameshift mutation yields truncated, dysfunctional product proteins, leading to loss-of-function, genetic disorders or even death. Frameshift mutations have been considered as mostly harmful and of little importance for the molecular evolution of proteins. Frameshift protein sequences, encoded by the alternative reading frames of a coding gene, have been therefore considered as meaningless. However, existing studies had shown that frameshift genes/proteins are widely existing and sometimes functional. It is puzzling how a frameshift kept its structure and functionality while its amino-acid sequence is changed substantially. We revealed here that the protein sequences of the frameshifts are highly conservative when compared with the wild-type protein sequence, and the similarities among the three protein sequences encoded in the three reading frames of a coding gene are defined mainly by the genetic code. In the standard genetic code, amino acid substitutions assigned to frameshift codon substitutions are far more conservative than those assigned to random substitutions. The frameshift tolerability of the standard genetic code ranks in the top 1.0-5.0% of all possible genetic codes, showing that the genetic code is optimal in terms of frameshift tolerance. In some species, the shiftability is further enhanced at gene- or genome-level by a biased usage of codons and codon pairs, where frameshift-tolerable codons/codon pairs are overrepresented in their genomes. Supplemental files available at FigShare. File S1 contains frameshift similarity data; File S2 contains frameshift substitutions scores of the natural genetic code; File S3 contains frameshift substitutions scores of the alternative genetic codes; File S4 contains frameshift substitutions scores of different usages of codons; File S5 contains frameshift substitutions scores of different usage of codon pairs; Coding sequence data are available at GenBank, Ensembl or UCSC Genome Database; Code used to analyze the data can be found at https://github.com/CAUSA/Frameshift.<br>
提供机构:
Xiaolong Wang
创建时间:
2019-10-08
二维码
社区交流群
二维码
科研交流群
商业服务