Data and Code for: Reproductive strategies and their consequences for divergence, gene flow, and genetic diversity in three taxa of Clarkia
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.sxksn038b
下载链接
链接失效反馈官方服务:
资源简介:
Differences in reproductive strategies can have important implications for macro- and micro-evolutionary processes. We used a comparative approach through a population genetics lens to evaluate how three distinct reproductive strategies shape patterns of divergence among as well as gene flow and genetic diversity within three closely related taxa in the genus Clarkia. One taxon is a predominantly autonomous self-fertilizer and the other two taxa are predominantly outcrossing but vary in the primary pollinator they attract. In genotyping populations using genotyping-by-sequencing and comparing loci shared across taxa, our results suggest that differences in reproductive strategies in part promote evolutionary divergence among these closely related taxa. Contrary to expectations, we found that the selfing taxon had the highest levels of heterozygosity but a low rate of polymorphism. The high levels of fixed heterozygosity for a subset of loci suggests this pattern is driven by the presence of structural rearrangements in chromosomes common in other Clarkia taxa. In evaluating patterns within taxa, we found a complex interplay between reproductive strategy and geographic distribution. Differences in the mobility of primary pollinators did not translate to a difference in rates of genetic diversity and gene flow within taxa – a pattern likely due to one taxon having a patchier distribution and a less temporally and spatially reliable pollinator. Taken together, this work advances our understanding of the factors that shape gene flow and the distribution of genetic diversity within and among closely related taxa.
Methods
DNA extraction and sequencing
We extracted genomic DNA following a modified cetyltrimethylammonium (CTAB) developed by Doyle and Doyle (1987). We used single nucleotide polymorphisms (SNPs) generated from genotyping-by-sequencing libraries which were prepared following Elshire et al. (2011) and using the restriction enzyme ApeKI to fragment the genome. To avoid any batch effect, half the individuals from each population were split between the two genomic libraries of 96. Each polymerase chain reaction (PCR) was carried out independently for all samples, and each library was then quantified using High sensitivity QubitTM (dsDNA HS Assay Kit, Thermo Fisher Scientific) and then pooled in the final step before sequencing to assure an equivalent amount of each sample was present in the final genomic library. Sequencing was performed using Illumina HiSeq, 150bp Paired-End reads at the Center for Genetic Medicine at Northwestern Medicine.
Calling single nucleotide polymorphisms (SNPs)
We used STACKS v 2.2 (Catchen et al. 2011; 2013) to call single nucleotide polymorphisms (SNPs) to generate four distinct datasets. We generated a combined set to compare measures of genetic diversity and divergence among taxa, and one dataset per taxon for comparisons between populations within taxa. To evaluate divergence among C. concinna subsp. automixa, C. concinna subsp. concinna, and C. breweri, we called SNPs that were shared among at least two taxa (i.e., the combined dataset). Because the combined dataset resulted in many loci being monomorphic within one taxon but polymorphic in the others, it was necessary to call SNPs for each taxon separately to assess genetic diversity, inbreeding, and population structure within taxa. For the combined and separate datasets, the parameters -m, -M, -n, -max-locus-stacks, and -bound-high were optimized using four samples from each population run across lanes and changing one parameter at a time. The ‘best’ parameters were those that maximize the number of SNPs while minimizing genetic distance between samples from the same populations as generated in a metric multi-dimensional scaling (MDS) plot using PLINK 2 (Purcell et al. 2007; Mastretta-Yanes et al. 2015) – the parameters used to call SNPs varied for each dataset (Appendix A, Figures S2 – 5, Table S1a – b).
For the combined dataset, we built a catalog using all samples and labeled them by taxonomic assignment (C. concinna subsp. automixa, C. concinna subsp. concinna, and C. breweri) for the population map. We ran the ‘populations’ command in STACKS and only called loci that were present in at least two of the three taxa (-p 2), in at least 50% of individuals in a taxon (r -0.5), and with a minor allele frequency greater than 0.05 (-maf 0.05), and one SNP per sequence. For the three datasets where SNPs were called separately for C. concinna subsp. automixa, C. concinna subsp. concinna, and C. breweri, we built catalogs with five samples from each population and included samples that had high numbers of reads and were collected across the population and sequenced on different plates. For the ‘populations’ command, we specified that loci needed to be in at least 80% of individuals (-r 0.80), the minor allele frequency needed to be greater than 0.05 (as suggested by Paris et al. 2017), and one SNP per sequence was allowed. All datasets were then quality filtered for read depth, missing data, and Hardy-Weinberg Equilibrium (Appendix A). In total 16 individuals failed to pass quality filtering leaving a total sample size of 166 individuals, with 52 individuals of C. concinna subsp. concinna, 29 individuals of C. concinna subsp. automixa and 84 individuals of C. breweri.
Statistical analyses
Among taxa – Genetic divergence and diversity
We used the combined dataset to determine the amount of divergence among taxa with distinct mating systems. We used the program ADMIXTURE 1.3.0 (Alexander, Novembre, and Lange 2009) to evaluate population genetic structure among taxa by considering genetic clusters, or K, from 1-10 and employing a cross-validation procedure. We considered the most appropriate number of K to be the one with the lowest cross-validation score or the K at the ‘knee’ of the cross-validation plot. We then used ADMIXTURE to calculate pairwise FST between the genetic clusters. In addition, we evaluated the divergence among groups by using the first two axes of a scaled and centered principal components analysis (PCA) generated with the program adegenet() (Jombart 2008). All analyses were conducted in R v. 4.0.2 (R Core Team 2020), unless noted otherwise.
We also used the combined dataset to evaluate patterns of genetic diversity of the loci shared between taxa. Using the STACKS populations output, we calculated the number and percent of polymorphic loci. The low population sample size precluded the use and testing of population-based measures of genetic diversity and inbreeding. However, robust sampling at the individual level enabled the use of the genhet() function in R to measure the individual level proportion of heterozygous loci (PHt), or the number of heterozygous loci over number of genotyped loci (Coulon 2010). We then used the stats() package (R Core Team 2020) to test for taxon-based differences in PHt using a pairwise Wilcoxon rank sum test with a Bonferroni correction.
Within taxa – Genetic diversity, inbreeding, effective population size, and gene flow
Using the three datasets called for each taxon separately, we investigated inbreeding, genetic diversity, gene flow, and effective population size (NE). We again measured individual level PHt and as well as the inbreeding coefficient (F) using PLINK 2 and the –het command (Purcell et al. 2007). We tested for taxon-based differences in PHt and F using a pairwise Wilcoxon rank sum test with a Bonferroni correction. We then estimated NE of each population using the program NeEstimator v. 2.1. (Do et al. 2014). The linkage disequilibrium method, which calculates NE based on the amount of linkage disequilibrium within a population while correcting for sample size, was unable to estimate measures of NE and 95% confidence intervals (Waples and Do 2010). However, the heterozygote excess method estimated NEB, or the effective number of breeders, which gives reliable insight into NE when the effective population size is small (Zhdanova and Pudovkin 2008; Waples and Do 2010; Gilbert and Whitlock 2015). This method takes advantage of random differences in allele frequencies between parents in a small population, which results in an excess of heterozygote genotypes compared to expectations under Hardy-Weinberg Equilibrium (Pudovkin, Zaykin, and Hedgecock 1996). We tested for differences in NEB between C. concinna subsp. concinna and C. breweri using a Kruskal-Wallis rank sum test with the stats() package but were unable to include C. concinna subsp. automixa in this assessment due to low sample size.
We again used ADMIXTURE and PCA plots as described above to compare population genetic structure and gene flow between populations within taxa. In addition, we used the program GENEPOP to calculate pairwise FST (Weir & Cockerham 1984) between populations for each taxon. We then used an independent two group Mann-Whitney U Test to test for differences in pairwise FST between C. concinna subsp. concinna and C. breweri. We were unable to include C. concinna subsp. automixa because only two populations were sampled. We also evaluated patterns of isolation by distance for C. concinna subsp. concinna and C. breweri, the two taxa with sufficient sampling. We calculated a pairwise matrix with FST / (1 – FST) (Rousset 1997) between populations as well as a pairwise matrix with the log of geographic distance between populations. We then used both in a Mantel test in the R package ade4 with 9999 replicates (Dray and Dufour 2007).
创建时间:
2023-09-11



