Data from: De novo transcriptomic analyses for non-model organisms: an evaluation of methods across a multi-species data set
收藏DataONE2013-01-09 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
High-throughput sequencing (HTS) is revolutionizing biological research by enabling scientists to quickly and cheaply query variation at a genomic scale. Despite the increasing ease of obtaining such data, using these data effectively still poses notable challenges, especially for those working with organisms without a high-quality reference genome. For every stage of analysis – from assembly to annotation to variant discovery – researchers have to distinguish technical artefacts from the biological realities of their data before they can make inference. In this work, I explore these challenges by generating a large de novo comparative transcriptomic data set data for a clade of lizards and constructing a pipeline to analyse these data. Then, using a combination of novel metrics and an externally validated variant data set, I test the efficacy of my approach, identify areas of improvement, and propose ways to minimize these errors. I find that with careful data curation, HTS can be a powerful tool for generating genomic data for non-model organisms.
高通量测序(High-throughput sequencing, HTS)正彻底革新生物学研究,使科学家能够快速且低成本地在基因组规模上探究遗传变异。尽管获取这类数据的门槛日益降低,但高效利用这些数据仍面临诸多显著挑战,尤其针对研究缺乏高质量参考基因组的生物的学者而言。从序列组装、基因注释到变异识别的每一个分析阶段,研究者都必须先将技术伪影(technical artefacts)与数据中的生物学真实信号区分开来,方可开展后续推断。本研究通过为一个蜥蜴演化支构建大规模从头比较转录组数据集,并搭建专属分析流程以处理该数据集,以此探究上述挑战。随后,结合新型评估指标与外部验证的变异数据集,作者对所提方法的有效性进行了测试,明确了可优化方向,并提出了减少此类误差的解决方案。研究结果表明,只要经过严谨的数据整理与质控,高通量测序便可成为为非模式生物生成基因组数据的有力工具。
创建时间:
2013-01-09



