five

DataSheet9_Profiling the Genome-Wide Landscape of Short Tandem Repeats by Long-Read Sequencing.PDF

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/DataSheet9_Profiling_the_Genome-Wide_Landscape_of_Short_Tandem_Repeats_by_Long-Read_Sequencing_PDF/19710172
下载链接
链接失效反馈
官方服务:
资源简介:
Background: Short tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases and the regulation of gene expression. Long-read sequencing (LRS) offers a potential solution to genome-wide STR analysis. However, characterizing STRs in human genomes using LRS on a large population scale has not been reported. Methods: We conducted the large LRS-based STR analysis in 193 unrelated samples of the Chinese population and performed genome-wide profiling of STR variation in the human genome. The repeat dynamic index (RDI) was introduced to evaluate the variability of STR. We sourced the expression data from the Genotype-Tissue Expression to explore the tissue specificity of highly variable STRs related genes across tissues. Enrichment analyses were also conducted to identify potential functional roles of the high variable STRs. Results: This study reports the large-scale analysis of human STR variation by LRS and offers a reference STR database based on the LRS dataset. We found that the disease-associated STRs (dSTRs) and STRs associated with the expression of nearby genes (eSTRs) were highly variable in the general population. Moreover, tissue-specific expression analysis showed that those highly variable STRs related genes presented the highest expression level in brain tissues, and enrichment pathways analysis found those STRs are involved in synaptic function-related pathways. Conclusion: Our study profiled the genome-wide landscape of STR using LRS and highlighted the highly variable STRs in the human genome, which provide a valuable resource for studying the role of STRs in human disease and complex traits.

背景:短串联重复序列(short tandem repeats, STRs)是一类高度多态的遗传元件,在多种遗传疾病及基因表达调控中发挥关键作用。长读长测序(long-read sequencing, LRS)为全基因组范围的STR分析提供了潜在解决方案,但目前尚未见基于长读长测序开展大人群规模人类基因组STR特征解析的相关研究报道。 方法:本研究针对中国人群的193份无关个体样本,开展了基于长读长测序的大规模STR分析,并完成了人类基因组全范围STR变异的全景绘制。本研究引入重复动态指数(repeat dynamic index, RDI)以量化评估STR的变异程度;从基因型组织表达(Genotype-Tissue Expression)数据库获取表达谱数据,以探究高变异STR相关基因在不同组织中的表达特异性;同时通过富集分析,解析高变异STR潜在的生物学功能。 结果:本研究首次基于长读长测序完成了人类STR变异的大规模分析,并构建了基于该数据集的参考STR数据库。研究发现,疾病相关STR(disease-associated STRs, dSTRs)与邻近基因表达相关的STR(expression-associated STRs, eSTRs)在普通人群中均呈现高度变异特征。此外,组织特异性表达分析显示,高变异STR相关基因在脑组织中表达水平达到峰值;通路富集分析结果表明,此类STR参与了突触功能相关的生物学通路。 结论:本研究利用长读长测序绘制了人类基因组STR的全基因组分布图谱,并鉴定出人类基因组中的高变异STR位点,为探究STR在人类疾病及复杂性状中的调控作用提供了宝贵的研究资源。
创建时间:
2022-05-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作