CGG, CAG, and GAA: genome-wide comparison of the disease linked Trinucleotide short tandem repeat
收藏DataCite Commons2026-03-05 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.5tb2rbpgt
下载链接
链接失效反馈官方服务:
资源简介:
Short tandem repeats (STRs) are tracts of 1–6 bp DNA motifs repeated in a
head-to-tail fashion, collectively accounting for approximately 3% of the
human genome. Among these, trinucleotide STRs hold particular relevance
due to their involvement in human genetic disorders, with CGG, CAG, and
GAA repeats being causative of Fragile X Syndrome, Huntington’s Disease,
and Friedreich’s Ataxia, respectively. In this study, we systematically
examined the genomic distribution, abundance, repeat length, and
polymorphism of 5,963 CGG, 11,220 CAG, and 16,105 GAA loci across a cohort
of 191 healthy individuals. Marked differences were observed between the
three repeat classes. CGG STRs, while the least abundant, were strongly
enriched within exonic and promoter regions and exhibited the highest
levels of polymorphism, particularly in genic regions. GAA STRs were by
far the most abundant and displayed the greatest overall variability, with
the majority located in intergenic and intronic regions, but showing
minimal polymorphism in exons and 5′-UTRs. In contrast, CAG STRs were more
evenly distributed across genic and intergenic regions and were strikingly
stable, despite being known to drive pathogenic expansions when exceeding
certain thresholds. These findings demonstrate that trinucleotide STR
classes are not interchangeable but exhibit unique genomic and
evolutionary characteristics. Nucleotide composition emerges as a key
determinant of STR localization, stability, and variability, suggesting
that the biological roles of these repeats are intrinsically tied to their
motif sequence. Our study underscores the importance of analyzing STR
classes individually, as grouping them solely by motif length risks
overlooking significant functional distinctions.
提供机构:
Dryad
创建时间:
2025-10-23



