five

Time is of the essence: using archived samples to develop a GT-seq panel to preserve continuity of ongoing genetic monitoring

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.4mw6m90mv
下载链接
链接失效反馈
官方服务:
资源简介:
For the past 25 years, genetic monitoring of Rio Grande silvery minnow (Hybognathus amarus) has been conducted annually. The monitoring program has been carried out using nine microsatellite loci. Recently a temporal genome-wide microhaplotype dataset obtained from nextRAD-seq (a reduced representation sequencing approach) was obtained from archived samples spanning 20 years and allowed to compare results from both datasets (Osborne et al. 2022). To develop a GT-seq panel that was able to track past genomic changes across the time-series, ensuring this way the continuity of the ongoing genetic monitoring, we first identified loci from that nextRAD-seq but using a new conspecific reference genome. The final dataset included 2,983 loci and 379 individuals (nextRAD_complete dataset). From those, we selected a subset of 500 loci with the highest power to track the changes identified with the genome-wide data for GT-seq PCR multiplex optimization. We also included the sex-linked marker HAM06 from Caeiro-Dias et al. (2023) in the panel optimization. After four rounds of panel optimization, we retained 284 loci. The optimized panel was used to genotype 118 samples from eight temporal collections (a subset of the nextRAD_complete) for validation of the 283 loci in the GT-seq panel; other 20 samples of known sex were used for sex assignment accuracy using the sex-marker genotyped with the GT-seq panel. The nextRAD_complete is provided as a VCF (single SNPs) and GENEPOP file (microhaplotypes). The GT-seq_283 is provided as a GENEPOP file (microhaplotypes). Methods Genome-wide SNP identification was performed using Nextera-tagmented reductively-amplified DNA sequencing (nextRAD-seq; Russello et al., 2015) data from 379 individuals reported in Osborne et al. (2023), comprising 12 temporal collections that spanned 20 years. Microhaplotypes were identified using the methods also described in Osborne et al. (2023), but with four modifications. First, NextRAD loci were identified using the draft genome sequenced for this study; no depth of coverage filter was applied to nextRAD loci before variant calling; loci were discarded if mean depth of coverage was lower than 20; and only individuals with less than 25% missing data were retained. Microhaplotypes and individuals retained after all filtering steps are referred to as nextRAD_complete dataset. The nextRAD_complete is provided as a VCF containing single SNPs and as GENEPOP file containing the haplotyped SNPs (microhaplotypes). The optimized GT-seq panel excluding the sex-linked marker (GT-seq_283) was used to genotype 118 samples from eight temporal collections. Those samples are also included in the nextRAD_complete dataset. The GTscore pipeline v. 1.3 (https://github.com/gjmckinney/GTscore) was used to identify genotypes. In-silico probes were designed for each SNPs to include eight nucleotides flanking for each SNP and to include variants when overlapping identified SNPs (see manual for details on probe design https://github.com/gjmckinney/GTscore/blob/master/GTScoreDocumentation%20V1.3.docx). AmpliconRadCounter.pl script was used to count the number of unique reads per individual, to identify on-target reads, and to count the number of reads containing each SNP allele for every individual. Then counts of reads containing a SNP allele for each individual were used for microhaplotype genotyping with the maximum likelihood algorithm described by McKinney et al. (2018) and implemented in GTscore.R script. Only individuals genotyped for at least 70% of the loci were kept in the dataset, resulting in 72 individuals from five temporal collections retained. Missing data across loci was not higher than 30%. The GT-seq_283 dataset is provided as a GENEPOP file (microhaplotypes).
创建时间:
2025-02-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作