implications of methodologies for integrating empirical kinships into ex situ population management using PMx: a case study of Baer’s Pochard (Aythya baeri) in North America

NIAID Data Ecosystem2026-05-01 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.6t1g1jx3m

下载链接

链接失效反馈

官方服务：

资源简介：

In this study, we aimed to understand the implications of integrating empirical kinships into the genetic management of an ex situ population of the endangered waterfowl, Baer’s pochard (Aythya baeri), in North America. Single nucleotide polymorphism data were generated for 141 Baer’s pochard using double digest restriction site-associated DNA sequencing and empirical kinships were derived and integrated into the population management software PMx. We compared three different scenarios for appying empirical kinships within PMx: 1) no empirical kinships applied, 2) empirical kinships applied for pedigree terminals, 3) empirical kinships applied for the entire populations of pedigree terminals and descendants. We determined that most genetic summary statistics were impacted through the calculation of the population’s mean kinship, which increased signficantly after empirical kinships were integrated into our analyses. Our results also revealed the importance of understanding how molecular kinships derived from a particular estimator are scaled, if the scale differs significantly from pedigree-based kinships. We describe the theory behind the genetic metrics impacted and provide general guidance on incorporating empirical kinships into ex situ population management as well as provide suggestions for sampling strategies to minimize the biases inherent in merging two types of kinship estimators. Methods DNA extracted from whole blood from each of the 141 sampled individuals were sent to the Genomic and Bioinformatics Service, Texas A&M AgriLife Research laboratory for ddRAD library preparation and sequencing. A total of 500 ng of DNA from each individual were provided. Paired-end sequences approximately 150-bp in length were produced on a single lane of the Illumina NovaSeq 6000 S2 X platform. Demultiplexed sequence data was obtained in the form of compressed fastq files (fastq.gz), representing raw paired-end sequencing reads. Data filtering and SNP discovery were performed in STACKS v2.41 (Catchen et al. 2013, Rochette et al. 2019). Initially, sequence data was cleaned using the program process_radtags by removing reads with an uncalled base or low quality score (raw phred score <10). A custom bioinformatics pipeline (available at https://github.com/apwilder/StacksParameterSelection) then was used to select optimal parameters (m, M, n) for STACKS based on the guidelines of Paris et al. (2017). The pipeline ran iterations of the STACKS de novo program, varying one parameter at a time (m, M, or n) while holding the other two parameters constant at default settings (m=3, M=2, n=1). Parameter values tested for the maximum distance allowed between stacks (-M) ranged from 2 and 5, and the minimum depth of coverage required to create a stack (-m) ranged from 1 and 5. A catalog was assembled from consensus loci with the number of mismatches allowed between sample loci when building the catalog (-n) tested from 1 and 5. Parameter values that maximized the number of total loci, polymorphic loci, and SNPs genotyped in at least 80% of individuals (r= 0.80) were then used for downstream analyses. Filtered reads were aligned into identical sequences or ‘stacks’ and putative loci were then identified de novo by comparing stacks. Putative loci (sets of stacks) were then matched against the catalog. Reads were aligned from each sample one locus at a time to identify SNPs across the entire sample set for each locus, genotyping each individual at each SNP. Finally, SNPs were further filtered for a minor allele frequency (MAF) cut-off of 0.02 to remove potential SNPs that might have been generated due to genotyping error, and loci shared by at least 90% of the population retained for further analyses (r=0.90). A higher r-value than that used for parameter selection ultimately was chosen for the final dataset because the number of available SNPs supported identifying a more consistent pool of loci across individuals for downstream analyses. The KING algorithm in PLINK v2.00a was used to calculate pairwise KING-robust kinships between individuals in the dataset.

创建时间：

2023-11-02