Recent origin of a range-restricted species with subsequent introgression in its widespread congener in the Phyteuma spicatum group (Campanulaceae)

NIAID Data Ecosystem2026-05-02 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.8gtht76x2

下载链接

链接失效反馈

官方服务：

资源简介：

Understanding the causes of restricted geographic distributions is of major interest to evolutionary and conservation biologists. Inferring historical factors has often relied on ad hoc interpretations of genetic data, and hypothesis testing within a statistical framework under different demographic scenarios remains underutilized. Using coalescent modeling on RAD-sequencing data we (i) test hypotheses about the origin of Phyteuma gallicum (Campanulaceae), a range-restricted endemic of central France sympatric with its widespread congener P. spicatum, and (ii) date its origin, irrespective of its mode of origin, to test the hypothesis that the restricted range is due to a recent time of origin. The best-supported model of origin is one of a dichotomous split of P. gallicum, confirmed as a distinct species, and the Central European P. nigrum with subsequent gene flow between P. gallicum and P. spicatum. The split of Ph. gallicum and P. nigrum is estimated at 45–55,000 years ago. Coalescent modeling on genomic data not only clarified the mode of origin (dichotomous speciation instead of hybridogenic origin) but identified recency of speciation as a sufficient, though likely not the sole, factor to explain the restricted distribution range. Coalescent modeling strongly improves our understanding of the evolution of range-restricted species that are frequently of conservation concern, as is the case for P. gallicum. Methods Leaf material along with voucher specimens from several sampling sites for each study taxon were sampled in the field and stored in silica-gel. DNA was extracted using the Invisorb Spin Plant Mini Kit (Invitec Molecular, Berlin, Germany) following the manufacturer’s protocol with one modification. The genomic extracts were cleaned with NucleoSpin gDNA Clean‑up. RAD libraries were prepared using a custom protocol. The reads were demultiplexed and quality filtered using BamIndexDecoder from illumina2bam 1.03 and process_radtags from Stacks2 2.41. assembly was done using denovo_map.pl script from Stacks2. We retained polymorphic RADtags with a maximum of 60% missing data across individuals and a maximum of ten SNPs per locus using a combination of populations from Stacks2 and a custom script. Additionally, we retained only loci identified as belonging to a spermatophyte by blasting the RADtags against the BLAST databases. We further mapped the raw reads of all individuals against an artificial reference including the filtered RADtags using Bowtie2 2.3.4.1. The SAM files were converted to BAM files, sorted by reference coordinates, and read groups were added using Picard 2.20.1. Realignments around indels were done using Genome Analysis Toolkit 3.8 (GATK). Final genotypes were called using ref_map.pl from Stacks2. The called SNPs were filtered into two datasets, one, for exploratory purposes, comprising all investigated taxa including P. pyrenaicum and P. × adulterinum (hereafter the “complete dataset”) and a second one, for coalescent modeling, comprising only samples from P. gallicum, P. nigrum and P. spicatum (the “reduced dataset”). The complete dataset was filtered as follows: for each locus with at most ten SNPs, a single random SNP with at most 50% missing data, a maximum observed heterozygosity of 50% and that was variable in more than one individual (i.e., not a singleton) was extracted using populations from Stacks2. The reduced dataset was further filtered by only retaining SNPs from loci with a coverage of at least six using vcftools 0.1.15 and by requiring that all SNPs must be present in at least two individuals per sampling site. Finally, loci that had an allele frequency of exactly 0.5:0.5 over the entire dataset were removed since their minor allele frequency cannot be determined. A Neighbor-net was calculated using the Hasegawa-Kishino-Yano substitution model with empirical frequencies in Splitstree4 4.16. We also made an ordination of the samples by calculating principal components (PC) for both datasets using the PCA (Principal Component Analysis) function in the R (3.6.3) package adegenet 2.1.3. Finally, we analyzed the genetic structure between the taxa using sNMF (Frichot et al 2014) as part of the R package LEA 2.6.0. As a summary statistic for coalescent modeling we used Site Frequency Spectra (SFS) for each deme (gene pool) as identified with sNMF and PC. To avoid potential biases of the SFS from missing data, we first down-sampled the 6-8 haploid genotypes (corresponding to 3-4 diploid individuals) of each sampling site by randomly sampling four (present) haploid genotypes from across all individuals for each locus within the respective sampling site, thus ensuring that no missing data is present in the dataset. Both the subsampling and the calculation of the folded 2D SFS were done using SFS-scripts. The coalescent modeling was further done using FastSimCoal 2.7 (Excoffier et al 2021). In order to derive an informed interval to be used as a sampling range for the estimation of the parameter determining the divergence time of the common ancestor (TDIVANC), we calculated the maximum and minimum genetic distance between P. nigrum and P. spicatum. The best substitution model was selected using jModelTest 2.1.10 and Splitstree 4.16 was used to calculate a distance matrix, which was then converted to generations using the mutation rate of Arabidopsis thaliana (5.9E-09 substitutions/site/generation). In order to get a 95% confidence interval (CI) of the estimated age parameters (time of divergence of the ancestor of all taxa and time of origin of P. gallicum), for the best-fitting model, we used the parametric bootstrapping approach of Ye et al (2020). Briefly, we used FastSimCoal 2.7 to simulate 100 pseudoreplicate SFS under the parameters of the best-fitting run. Each of these pseudoreplicate SFS was then used as observed data for another 100 replicate analyses using the same settings as for the original data (100,000 coalescent simulations and 60 optimization cycles). The best-fitting replicate was then selected for each simulated SFS and used to calculate the mean parameter estimates and the 95% CIs.

创建时间：

2024-11-24