Amino Acid Changes in Disease-Associated Variants Differ Radically from Variants Observed in the 1000 Genomes Project Dataset
收藏Figshare2016-01-18 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/_Amino_Acid_Changes_in_Disease_Associated_Variants_Differ_Radically_from_Variants_Observed_in_the_1000_Genomes_Project_Dataset_/876636
下载链接
链接失效反馈官方服务:
资源简介:
The 1000 Genomes Project data provides a natural background dataset for amino acid germline mutations in humans. Since the direction of mutation is known, the amino acid exchange matrix generated from the observed nucleotide variants is asymmetric and the mutabilities of the different amino acids are very different. These differences predominantly reflect preferences for nucleotide mutations in the DNA (especially the high mutation rate of the CpG dinucleotide, which makes arginine mutability very much higher than other amino acids) rather than selection imposed by protein structure constraints, although there is evidence for the latter as well. The variants occur predominantly on the surface of proteins (82%), with a slight preference for sites which are more exposed and less well conserved than random. Mutations to functional residues occur about half as often as expected by chance. The disease-associated amino acid variant distributions in OMIM are radically different from those expected on the basis of the 1000 Genomes dataset. The disease-associated variants preferentially occur in more conserved sites, compared to 1000 Genomes mutations. Many of the amino acid exchange profiles appear to exhibit an anti-correlation, with common exchanges in one dataset being rare in the other. Disease-associated variants exhibit more extreme differences in amino acid size and hydrophobicity. More modelling of the mutational processes at the nucleotide level is needed, but these observations should contribute to an improved prediction of the effects of specific variants in humans.
千人基因组计划(1000 Genomes Project)数据集可作为人类氨基酸种系突变的天然背景数据集。由于突变方向已明确,基于观测到的核苷酸变异生成的氨基酸替换矩阵呈非对称特征,且不同氨基酸的突变易感性差异显著。这些差异主要反映了DNA中核苷酸突变的偏好性(尤其是CpG二核苷酸(CpG dinucleotide)的高突变率,这使得精氨酸的突变易感性远高于其他氨基酸),而非蛋白质结构约束带来的选择压力,尽管也存在后者相关的证据。该数据集的变异主要发生在蛋白质表面(占比82%),相较于随机位点,更倾向于暴露程度更高、保守性更低的位点。功能残基上的突变发生频率约为随机预期值的一半。在线人类孟德尔遗传数据库(Online Mendelian Inheritance in Man, OMIM)中与疾病相关的氨基酸变异分布,与基于千人基因组计划数据集的预期分布存在显著差异。相较于千人基因组计划中的突变,与疾病相关的变异更偏好发生在保守性更高的位点。多数氨基酸替换谱呈现出负相关特征:某一数据集里的常见替换在另一数据集中却极为罕见。与疾病相关的变异在氨基酸大小与疏水性方面表现出更极端的差异。尽管仍需针对核苷酸层面的突变过程开展更多建模研究,但上述观测结果有助于优化人类特定变异效应的预测工作。
创建时间:
2016-01-18



