Data from: Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants
收藏DataCite Commons2025-05-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.63h33
下载链接
链接失效反馈官方服务:
资源简介:
Background: Simple Sequence Repeats (SSRs) are widely used in population
genetic studies but their classical development is costly and
time-consuming. The ever-increasing available DNA datasets generated by
high-throughput techniques offer an inexpensive alternative for SSRs
discovery. Expressed Sequence Tags (ESTs) have been widely used as SSR
source for plants of economic relevance but their application to non-model
species is still modest. Methods: Here, we explored the use of publicly
available ESTs (GenBank at the National Center for Biotechnology
Information-NCBI) for SSRs development in non-model plants, focusing on
genera listed by the International Union for the Conservation of Nature
(IUCN). We also search two model genera with fully annotated genomes for
EST-SSRs, Arabidopsis and Oryza, and used them as controls for genome
distribution analyses. Overall, we downloaded 16 031 555 sequences for 258
plant genera which were mined for SSRsand their primers with the help of
QDD1. Genome distribution analyses in Oryza and Arabidopsis were done by
blasting the sequences with SSR against the Oryza sativa and Arabidopsis
thaliana reference genomes implemented in the Basal Local Alignment Tool
(BLAST) of the NCBI website. Finally, we performed an empirical test to
determine the performance of our EST-SSRs in a few individuals from four
species of two eudicot genera, Trifolium and Centaurea. Results: We
explored a total of 14 498 726 EST sequences from the dbEST database
(NCBI) in 257 plant genera from the IUCN Red List. We identify a very
large number (17 102) of ready-to-test EST-SSRs in most plant genera (193)
at no cost. Overall, dinucleotide and trinucleotide repeats were the
prevalent types but the abundance of the various types of repeat differed
between taxonomic groups. Control genomes revealed that trinucleotide
repeats were mostly located in coding regions while dinucleotide repeats
were largely associated with untranslated regions. Our results from the
empirical test revealed considerable amplification success and
transferability between congenerics. Conclusions: The present work
represents the first large-scale study developing SSRs by utilizing
publicly accessible EST databases in threatened plants. Here we provide a
very large number of ready-to-test EST-SSR (17 102) for 193 genera. The
cross-species transferability suggests that the number of possible target
species would be large. Since trinucleotide repeats are abundant and
mainly linked to exons they might be useful in evolutionary and
conservation studies. Altogether, our study highly supports the use of EST
databases as an extremely affordable and fast alternative for SSR
developing in threatened plants.
提供机构:
Dryad
创建时间:
2015-10-12



