Data from: SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

DataONE2013-09-19 更新2024-06-27 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

SSR_pipeline 是一套灵活的程序集合，旨在从双端高通量Illumina DNA测序数据中高效识别简单序列重复（simple sequence repeats，SSR，又称微卫星microsatellite）。该程序套件包含3个分析模块与1个控制模块，后者可实现海量数据的自动化分析流程。三类分析模块可实现以下功能：1）筛选出符合Illumina质量标准的双端测序序列（paired-end sequences）子集；2）将双端测序读段（paired-end reads）比对拼接为单条复合DNA序列；3）识别出包含符合用户指定参数的微卫星（包括简单型与复合型微卫星）的序列。该微卫星搜索算法效率极高，我们已利用其成功识别出基序（motif）长度为2~25bp的重复序列。三类分析模块均可独立运行，以提供更高的使用灵活性，或用于处理其他测序平台（如Roche 454、Ion Torrent等）生成的FASTQ或FASTA格式文件。我们以卤蝇（Ephydra packardi，双翅目：水蝇科）的测序数据为例展示了该程序的使用方法，并提供了经验计时基准数据，以说明其在普通台式计算机环境下的运行性能。本研究进一步表明，即使使用未富集的样本文库且仅占用单次DNA测序运行中极小比例的测序通量，Illumina测序平台仍可识别出大量微卫星。 SSR_pipeline的所有模块均基于Python编程语言开发，因此几乎可在所有主流计算机操作系统（Linux、Macintosh及Windows）上运行。

创建时间：

2013-09-19