Data from: Marker development for phylogenomics: the case of Orobanchaceae, a plant family with contrasting nutritional modes
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.7b86c
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenomic approaches, employing next-generation sequencing (NGS)
techniques, have revolutionized systematic and evolutionary biology.
Target enrichment is an efficient and cost-effective method in
phylogenomics and is becoming increasingly popular. Depending on
availability and quality of reference data as well as on biological
features of the study system, (semi-)automated identification of suitable
markers will require specific bioinformatic pipelines. Here, we
established a highly flexible bioinformatic pipeline, BaitsFinder, to
identify putative orthologous single copy genes (SCGs) and to construct
bait sequences in a single workflow. Additionally, this pipeline has been
constructed to be able to cope with challenging data sets, such as the
nutritionally heterogeneous plant family Orobanchaceae. To this end, we
used transcriptome data of differing quality available for four
Orobanchaceae species and, as reference, SCG data from monkeyflower
(Erythranthe guttata, syn. Mimulus g.; 1,915 genes) and tomato (Solanum
lycopersicum; 391 genes). Depending on whether gaps were permitted in
initial blast searches of the four Orobanchaceae species against the
reference, our pipeline identified 1,307 and 981 SCGs with average length
of 994 bp and 775 bp, respectively. Automated bait sequence construction
(using 2× tiling) resulted in 38,170 and 21,856 bait sequences,
respectively. In comparison to the recently published MarkerMiner 1.0
pipeline BaitsFinder identified about 1.6 times as many SCGs (of at least
900 bp length). Skipping steps specific to analyses of Orobanchaceae,
BaitsFinder was successfully used in a group of non-parasitic plants
(three Asteraceae species and, as reference, SCG data from Arabidopsis
thaliana based on previously compiled SCGs). Thus, BaitsFinder is expected
to be broadly applicable in groups, where only transcriptomes or partial
genome data of differing quality are available.
提供机构:
Dryad
创建时间:
2017-11-07



