five

Establishing species boundaries in Bornean geckos

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.w0vt4b90w
下载链接
链接失效反馈
官方服务:
资源简介:
Species delimitation using mitochondrial DNA (mtDNA) remains an important and accessible approach for discovering and delimiting species. However, delimiting species with a single locus (e.g. DNA barcoding) is biased towards overestimating species diversity. The highly diverse gecko genus Cyrtodactylus is one such group where delimitation using mtDNA remains the paradigm. In this study, we use genomic data to test putative species boundaries established using mtDNA within three recognized species of Cyrtodactylus on the island of Borneo. We predict that multilocus genomic data will estimate fewer species than mtDNA, which could have important ramifications for the species diversity within the genus. We aim to 1) investigate the correspondence between species delimitations using mtDNA and genomic data; 2) infer species trees for each target species; and 3) quantify gene flow and identify migration patterns to assess population connectivity. We show mtDNA approaches overestimate species diversity compared to genomic methods, underscoring the value of using genomic data to reassess mtDNA-based species delimitations for taxa lacking clear species boundaries. We expect the number of recognized species within Cyrtodactylus to continue increasing, but when possible, genomic data should be included to inform more accurate species boundaries. Methods To generate genomic data, we conducted double digest restriction-site associated DNA sequencing (ddRADseq). We double-digested each sample using the digestion enzymes SbfI and MspI in CutSmart Buffer (New England Biolabs) for 7 hours at 37º C. For fragment purification, we used Sera-Mag SpeedBeads. We then ligated eight distinct barcodes to the cut sites of the fragmented DNA and subsequently size-selected (between 415 and 515 bp after accounting for adapter length) each library on a Blue Pippin Prep size fractionator (Sage Science). For the final library amplification, we used Phusion Hi-Fidelity DNA Polymerase and Illumina's indexed primers. We determine the concentration and size distribution of each indexed pool using an Agilent 2200 TapeStation. Lastly, we sent the quantified pools to QB3-Berkeley Genomics, UC Berkeley for qPCR to determine sequenceable library concentrations before multiplexing equimolar amounts of each pool for sequencing on two Illumina HiSeq 4000 lane (51-bp, single-end reads; 9 pools containing 8 samples). We combined these data with publically available ddRADseq data.  We removed individuals from each assembly containing less than 500,000 raw reads. For all assemblies, we applied a sequence similarity threshold of 85% to cluster reads within samples and loci between samples. We removed consensus sequences with low coverage (< 6 reads), excessive undetermined or heterozygous sites (> 5%), too many alleles for a sample (> 2 for diploids), or an excess of shared heterozygosity among samples (paralog filter = 0.5). For clade-specific assemblies, we required approximately 60% of individuals to share any given locus. For population genetic analyses, we filtered each dataset using VCFtools to only allow one SNP per locus (--thin 50) and filtered out variable sites present in less than 5% of individuals (-- maf 0.05). Using a reduced dataset for computational efficiency, we inferred time-calibrated species trees using SNAPP within the BEAST2 framework for both the small and large-bodied datasets. We applied secondary calibrations to the roots of each tree using the snapp_prep ruby script, which allows the generation of an .xml file with a molecular clock matschiner2022species. For the large-bodied dataset, we constrained the crown age of the tree to 18.83 mya with a normal distribution (sigma = 2), and for the small-bodied clade we constrained the crown age to 25.80 mya with a normal distribution (sigma = 2).To maximize the number of loci from the ddRADseq dataset, we generated separate assemblies for each Bornean clade (large- and small-bodied). For the time-calibrated species BEAST2 tree, we thinned the number of samples in our dataset to 32 small-bodied and 28 large-bodied individuals for computational efficiency.
创建时间:
2024-08-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作