Data from: Genomic exploration and molecular marker development in a large and complex conifer genome using RADseq and mRNAseq
收藏DataCite Commons2025-05-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.jr08b
下载链接
链接失效反馈官方服务:
资源简介:
We combined restriction site associated DNA sequencing (RADseq) using a
hypomethylation-sensitive enzyme and messenger RNA sequencing (mRNAseq) to
develop molecular markers for the 16 gigabase genome of Cedrus atlantica,
a conifer tree species. With each method, Illumina® reads from one
individual were used to generate de novo assemblies. SNPs from the RADseq
data set were detected in a panel of one single individual and three pools
of three individuals each. We developed a flexible script to estimate the
ascertainment bias in SNP detection considering the pooling and sampling
effects on the probability of not detecting an existing polymorphism. Gene
Ontology (GO) and transposable element (TE) search analyses were applied
to both data sets. The RADseq and the mRNAseq assemblies represented 0.1%
and 0.6% of the genome, respectively. Genome complexity reduction resulted
in 17% of the RADseq contigs potentially coding for proteins. This rate
was doubled in the mRNAseq data set, suggesting that RADseq also explores
noncoding low-repeat regions. The two methods gave very similar GO-slim
profiles. As expected, the two assemblies were poor in TE-like sequences
(<4% of contigs length). We identified 17,348 single nucleotide
polymorphisms (SNPs) in the RADseq data set and 5,714 simple sequence
repeats (SSRs) in the transcriptome. A subset of 282 SNPs was validated
using the Fluidigm genotyping technology, giving a conversion rate of
50.4%, falling within the expected range for conifers. Increasing sample
size had the greatest effect for ascertainment bias reduction. These
results validated the utility of the RADseq approach for highly complex
genomes such as conifers.
提供机构:
Dryad
创建时间:
2014-09-11



