Data from: SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data
收藏DataCite Commons2025-04-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.n65k2
下载链接
链接失效反馈官方服务:
资源简介:
SSR_pipeline is a flexible set of programs designed to efficiently
identify simple sequence repeats (e.g., microsatellites) from paired-end
high-throughput Illumina DNA sequencing data. The program suite contains 3
analysis modules along with a fourth control module that can automate
analyses of large volumes of data. The modules are used to 1) identify the
subset of paired-end sequences that pass Illumina quality standards, 2)
align paired-end reads into a single composite DNA sequence, and 3)
identify sequences that possess microsatellites (both simple and compound)
conforming to user-specified parameters. The microsatellite search
algorithm is extremely efficient, and we have used it to identify repeats
with motifs from 2 to 25bp in length. Each of the 3 analysis modules can
also be used independently to provide greater flexibility or to work with
FASTQ or FASTA files generated from other sequencing platforms (Roche 454,
Ion Torrent, etc.). We demonstrate use of the program with data from the
brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical
timing benchmarks to illustrate program performance on a common desktop
computer environment. We further show that the Illumina platform is
capable of identifying large numbers of microsatellites, even when using
unenriched sample libraries and a very small percentage of the sequencing
capacity from a single DNA sequencing run. All modules from SSR_pipeline
are implemented in the Python programming language and can therefore be
used from nearly any computer operating system (Linux, Macintosh, and
Windows).
提供机构:
Dryad
创建时间:
2013-09-19



