Data from: In solution hybridization for mammalian mitogenome enrichment: pros, cons, and challenges associated with multiplexing degraded DNA
收藏DataONE2015-07-21 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Here we present a set of RNA-based probes for whole mitochondrial genome in-solution enrichment, targeting a diversity of mammalian mitogenomes. This probes set was designed from 7 mammalian orders and tested to determine the utility for enriching degraded DNA. We generated 63 mitogenomes representing five orders and 22 genera of mammals that yielded varying coverage ranging from 0 to > 5,400X. Based on a threshold of 70% mitogenome recovery and at least 10X average coverage, 32 or 51% of samples were considered successful. The estimated sequence divergence of samples from the probe sequences used to construct the array ranged up to 20%. Sample type was more predictive of mitogenome recovery than sample age. The proportion of reads from each individual in multiplexed enrichments was highly skewed, with each pool having one sample that yielded a majority of the reads. Recovery across each mitochondrial gene varied with most samples exhibiting regions with gaps or ambiguous sites. We estimated the ability of the probes to capture mitogenomes from a diversity of mammalian taxa not included here by performing a clustering analysis of published sequences for 100 taxa representing most mammalian orders. Our study demonstrates that a general array can be cost and time effective when there is a need to screen a modest number of individuals from a variety of taxa. We also address the practical concerns for using such a tool, with regard to pooling samples, generating high quality mitogenomes, and detail a pipeline to remove chimeric molecules.
本研究提供一套用于全线粒体基因组溶液内富集(whole mitochondrial genome in-solution enrichment)的RNA探针(RNA-based probes),可靶向覆盖多样的哺乳动物线粒体基因组(mitogenome)。该探针组设计自7个哺乳动物目,并针对其富集降解DNA(degraded DNA)的效用开展了测试。本研究共获得63条哺乳动物线粒体基因组,涵盖5个目、22个属,其测序覆盖度(coverage)范围为0至5400X以上。以70%线粒体基因组回收率、平均测序覆盖度不低于10X作为成功判定阈值,共有32份样本(占比51%)被认定为富集成功。待检测样本与构建探针阵列所用探针序列间的估算序列分歧度最高可达20%。样本类型相较于样本存放时长,对线粒体基因组回收率的预测性更强。在多重富集(multiplexed enrichments)实验中,来自每个个体的测序读段占比存在显著偏倚,每个混合样本池均有1份样本贡献了绝大多数读段。不同线粒体基因的富集回收率存在差异,多数样本的序列中存在缺口或歧义位点。为评估该探针对本研究未涵盖的多样哺乳动物类群的线粒体基因组捕获能力,我们针对代表绝大多数哺乳动物目的100个类群的已发表序列开展了聚类分析。本研究证实,当需要从多个类群中筛选少量个体时,通用型探针阵列可实现成本与时间的高效利用。此外,我们还针对该工具的实际应用问题展开探讨,涵盖样本混合策略、高质量线粒体基因组的构建,并详细描述了一套用于去除嵌合分子(chimeric molecules)的分析流程。
创建时间:
2015-07-21



