Data from: An evaluation of transcriptome-based exon capture for frog phylogenomics across multiple scales of divergence (Class: Amphibia, Order: Anura)
收藏DataONE2016-05-27 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Custom sequence capture experiments are becoming an efficient approach for gathering large sets of orthologous markers in nonmodel organisms. Transcriptome-based exon capture utilizes transcript sequences to design capture probes, typically using a reference genome to identify intron–exon boundaries to exclude shorter exons (<200 bp). Here, we test directly using transcript sequences for probe design, which are often composed of multiple exons of varying lengths. Using 1260 orthologous transcripts, we conducted sequence captures across multiple phylogenetic scales for frogs, including outgroups ~100 Myr divergent from the ingroup. We recovered a large phylogenomic data set consisting of sequence alignments for 1047 of the 1260 transcriptome-based loci (~561 000 bp) and a large quantity of highly variable regions flanking the exons in transcripts (~70 000 bp), the latter improving substantially by only including ingroup species (~797 000 bp). We recovered both shorter (<100 bp) and longer exons (>200 bp), with no major reduction in coverage towards the ends of exons. We observed significant differences in the performance of blocking oligos for target enrichment and nontarget depletion during captures, and differences in PCR duplication rates resulting from the number of individuals pooled for capture reactions. We explicitly tested the effects of phylogenetic distance on capture sensitivity, specificity, and missing data, and provide a baseline estimate of expectations for these metrics based on a priori knowledge of nuclear pairwise differences among samples. We provide recommendations for transcriptome-based exon capture design based on our results, cost estimates and offer multiple pipelines for data assembly and analysis.
定制化序列捕获实验(custom sequence capture experiments)正成为从非模式生物中获取大规模直系同源标记位点(orthologous markers)的高效手段。基于转录组的外显子捕获(Transcriptome-based exon capture)技术借助转录本序列设计捕获探针,通常依托参考基因组识别内含子-外显子边界,以排除长度小于200 bp的短外显子。本研究直接采用转录本序列开展探针设计测试,这类转录本通常由多个长度不一的外显子构成。我们选取1260个直系同源转录本,针对蛙类开展了多系统发育尺度下的序列捕获实验,涵盖与内类群分化时长约100 Myr的外类群。最终我们获得了一套大型系统发育组数据集:包含1260个基于转录组的基因座中1047个的序列比对结果(总长度约561 000 bp),以及转录本中外显子侧翼的大量高变异性区域(初始长度约70 000 bp);若仅纳入内类群物种,该侧翼区域的总长度可提升至约797 000 bp,增幅显著。我们成功捕获到长度小于100 bp与大于200 bp的各类外显子,且外显子两端的测序覆盖度并未出现明显下降。实验过程中,我们观察到用于靶区域富集与非靶序列消减的封闭寡核苷酸(blocking oligos)性能存在显著差异;同时,捕获反应中混合的个体数量会影响PCR重复率,二者亦存在显著关联。我们还专门测试了系统发育距离对捕获灵敏度、特异性以及数据缺失率的影响,并基于样本间核基因两两差异的先验知识,为这些评估指标提供了基准预期值。本研究基于实验结果与成本估算,为基于转录组的外显子捕获设计提供了实操建议,并提供了多套用于数据组装与分析的流程(pipelines)。
创建时间:
2016-05-27



