five

Data from: PSMC (Pairwise Sequentially Markovian Coalescent) analysis of RAD (Restriction site Associated DNA) sequencing data

收藏
DataONE2016-10-05 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
The Pairwise Sequentially Markovian Coalescent (PSMC) method uses the genome sequence of a single individual to estimate demographic history covering a time span of thousands of generations. Although originally designed for whole genome data, we here use simulations to investigate its applicability to reference genome aligned RAD (Restriction site Associated DNA) data. We find that RAD data can potentially be used for PSMC analysis, but at present with limitations. The key factor is the proportion (p) of the genome that the RAD data covers. In our simulations, a proportion of 10% can still retain a substantial amount of coalescent information, whereas for 1% estimation becomes unreliable. The performance depends strongly on mutation rate (µ) and recombination rate (r) and is proportional to µp/r. When the value of this term is low, increasing the amount of data and number of iterations helps restoring the power of the estimation. We subsequently analyze one whole genome sequenced and 17 RAD sequenced threespine sticklebacks (Gasterosteus aculeatus) from a lake in Greenland. The whole genome sequence suggests a relatively recent expansion and decline within ca. 4,000-40,000 generations ago, possibly reflecting postglacial expansion and founding of the lake population. RAD data, where chromosomes from 10 individuals are combined identify a similar pattern. Our study provides guidance about the use of PSMC analysis and suggests measures that can improve its utility for RAD data. Finally, the study shows that RAD loci in general contain coalescent information that can be used for developing more targeted methods.

成对序列马尔可夫溯祖(Pairwise Sequentially Markovian Coalescent, PSMC)方法以单一个体的基因组序列为基础,可估算覆盖数千代时间跨度的种群动态历史。该方法最初专为全基因组数据设计,本研究通过模拟实验,探究其对参考基因组比对后的限制性酶切位点相关DNA(Restriction site Associated DNA, RAD)数据的适用性。研究发现,RAD数据理论上可用于PSMC分析,但当前应用存在一定局限。核心影响因素为RAD数据覆盖的基因组比例(p)。在本研究的模拟实验中,当覆盖比例达10%时,仍可保留大量溯祖信息;而当覆盖比例降至1%时,估算结果将失去可靠性。该方法的性能强烈依赖于突变率(μ)与重组率(r),且与μp/r呈正相关。当该指标取值较低时,增加数据量与迭代次数可有效恢复估算效能。随后,本研究对采自格陵兰某湖泊的1尾全基因组测序个体与17尾RAD测序的三刺棘鱼(Gasterosteus aculeatus)开展分析。全基因组测序结果显示,该种群约在4000至40000代前经历了一次较近期的扩张与衰退事件,这或反映了冰期后种群扩张以及该湖泊种群的奠基过程。将10个个体的染色体数据合并后的RAD数据集,也识别出了相似的种群动态模式。本研究为PSMC分析的应用提供了实操指导,并提出了可提升其在RAD数据中应用效能的改进策略。最后,本研究证实RAD位点普遍包含可用于开发更具针对性分析方法的溯祖信息。
创建时间:
2016-10-05
二维码
社区交流群
二维码
科研交流群
商业服务