Data from: De novo transcriptomic analyses for non-model organisms: an evaluation of methods across a multi-species data set

DataONE2013-01-09 更新2024-06-27 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

High-throughput sequencing (HTS) is revolutionizing biological research by enabling scientists to quickly and cheaply query variation at a genomic scale. Despite the increasing ease of obtaining such data, using these data effectively still poses notable challenges, especially for those working with organisms without a high-quality reference genome. For every stage of analysis – from assembly to annotation to variant discovery – researchers have to distinguish technical artefacts from the biological realities of their data before they can make inference. In this work, I explore these challenges by generating a large de novo comparative transcriptomic data set data for a clade of lizards and constructing a pipeline to analyse these data. Then, using a combination of novel metrics and an externally validated variant data set, I test the efficacy of my approach, identify areas of improvement, and propose ways to minimize these errors. I find that with careful data curation, HTS can be a powerful tool for generating genomic data for non-model organisms.

高通量测序（High-throughput sequencing, HTS）正彻底革新生物学研究，使科学家能够快速且低成本地在基因组规模上探究遗传变异。尽管获取这类数据的门槛日益降低，但高效利用这些数据仍面临诸多显著挑战，尤其针对研究缺乏高质量参考基因组的生物的学者而言。从序列组装、基因注释到变异识别的每一个分析阶段，研究者都必须先将技术伪影（technical artefacts）与数据中的生物学真实信号区分开来，方可开展后续推断。本研究通过为一个蜥蜴演化支构建大规模从头比较转录组数据集，并搭建专属分析流程以处理该数据集，以此探究上述挑战。随后，结合新型评估指标与外部验证的变异数据集，作者对所提方法的有效性进行了测试，明确了可优化方向，并提出了减少此类误差的解决方案。研究结果表明，只要经过严谨的数据整理与质控，高通量测序便可成为为非模式生物生成基因组数据的有力工具。

创建时间：

2013-01-09