five

Data from: Short tree, long tree, right tree, wrong tree: new acquisition bias corrections for inferring SNP phylogenies

收藏
DataONE2015-08-11 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practices for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD loci for phylogenetic inference. We compare the two primary ways RAD loci are used during phylogenetic analysis, including the analysis of full sequences (i.e., SNPs together with invariant sites), or the analysis of SNPs on their own after excluding invariant sites. We find that using full sequences rather than just SNPs is preferable from the perspectives of branch length and topological accuracy, but not of computational time. We introduce two new acquisition bias corrections for dealing with alignments composed exclusively of SNPs, a conditional likelihood method and a reconstituted DNA approach. The conditional likelihood method conditions on the presence of variable characters only (the number of invariant sites that are unsampled but known to exist is not considered), while the reconstituted DNA approach requires the user to specify the exact number of unsampled invariant sites prior to the analysis. Under simulation, branch length biases increase with the amount of missing data for both acquisition bias correction methods, but branch length accuracy is much improved in the reconstituted DNA approach compared to the conditional likelihood approach. Phylogenetic analyses of the empirical data using concatenation or a coalescent-based species tree approach provide strong support for many of the accepted relationships among phrynosomatid lizards, suggesting that RAD loci contain useful phylogenetic signal across a range of divergence times despite the presence of missing data. Phylogenetic analysis of RAD loci requires careful attention to model assumptions, especially if downstream analyses depend on branch lengths.

单核苷酸多态性(Single Nucleotide Polymorphisms, SNPs)作为系统发育研究中的实用分子标记,其优势部分源于其在全基因组中广泛分布且易于获取。限制性酶切位点相关DNA测序(Restriction site associated DNA sequencing, RADseq)技术在单核苷酸多态性数据获取中愈发流行,但目前仍缺乏对该类数据在系统发育研究中最佳应用方案的评估。本研究借助计算机模拟,结合针对角蜥科(Phrynosomatidae)的全新双酶切RADseq(double digest RADseq, ddRADseq)数据,探究RAD位点用于系统发育推断的准确性。本研究对比了系统发育分析中使用RAD位点的两种主流方式:一是分析完整序列(即包含单核苷酸多态性与保守位点的序列),二是剔除保守位点后仅分析单核苷酸多态性。研究结果表明,从分支长度与拓扑结构准确性的角度来看,使用完整序列而非仅单核苷酸多态性更为可取,但会增加计算耗时。针对仅由单核苷酸多态性构成的序列比对,本研究提出两种全新的获取偏倚校正方法:条件似然法与重构DNA法。条件似然法仅以变异性状的存在为条件(不考虑未被测序但已知存在的保守位点数量),而重构DNA法则要求研究者在分析前指定未被测序的保守位点的确切数目。模拟实验结果显示,两种获取偏倚校正方法的分支长度偏倚均随缺失数据量的增加而增大,但相较于条件似然法,重构DNA法的分支长度准确性提升更为显著。采用拼接法或基于溯祖理论的物种树方法对实测数据进行系统发育分析,结果强烈支持角蜥科蜥蜴诸多已被认可的演化关系,表明尽管存在缺失数据,RAD位点仍在广泛的分化时间尺度上携带有效的系统发育信号。针对RAD位点的系统发育分析需谨慎对待模型假设,尤其当下游分析依赖分支长度数据时。
创建时间:
2015-08-11
二维码
社区交流群
二维码
科研交流群
商业服务