An evaluation of inbreeding measures using a whole genome sequenced cattle pedigree

NIAID Data Ecosystem2026-03-12 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.vx0k6djq8

下载链接

链接失效反馈

官方服务：

资源简介：

The estimation of the inbreeding coefficient (F) is essential for the study of inbreeding depression (ID) or for the management of populations under conservation. Several methods have been proposed to estimate the realized F using genetic markers, but it remains unclear which one should be used. Here we used whole-genome sequence data for 245 individuals from a Holstein cattle pedigree to empirically evaluate which estimators best capture homozygosity at variants causing ID, such as rare deleterious alleles or loci presenting heterozygote advantage and segregating at intermediate frequency. Estimators relying on the correlation between uniting gametes (FUNI) or on the genomic relationships (FGRM) presented the highest correlations with these variants. However, homozygosity at rare alleles remained poorly captured. A second group of estimators relying on excess homozygosity (FHOM), homozygous-by-descent segments (FHBD), runs-of-homozygosity (FROH) or on the known genealogy (FPED) was better at capturing whole genome homozygosity, reflecting the consequences of inbreeding on all variants, and for young alleles with low to moderate frequencies. The results indicate that FUNI and FGRM might present a stronger association with ID. However, the situation might be different when recessive deleterious alleles reach higher frequencies, such as in populations with a small effective population size. For locus specific inbreeding measures or at low marker density, the ranking of the methods can also change as FHBD makes better use of the information from neighbouring markers. Finally, we confirmed that genomic measures are in general superior to pedigree-based estimates. In particular, FPED was uncorrelated with locus specific homozygosity. Methods DNA samples were extracted from whole blood or semen using standard protocols. Sequencing was done on Illumina HiSeq 2000 instruments with a PCR free method to prepare libraries with 550bp (DAMONA pedigree) insert sizes. Paired-end sequencing with read length of 2 x 100 base pairs was applied. The whole-genome sequence data was analyzed according to GATK Best Practice V3.4. Alignement of reads (FASTQ files) to the reference genome (Bos Taurus UMD 3.1) was done with BWA MEM (version 0.7.9a-r786, (Li 2013)) with the default settings. The sorted BAM had PCR duplicates detected using sambamba (v0,4,6) and Picard tools and bedtools were used to generate library statistics and coverage information. The obtained BAM files were then realigned around indels and recalibrated for base quality with Genome Analysis Toolkit (GATK 2.7.4., (DePristo et al. 2011)). List of known SNP used for recalibration were obtained from DBSNP release 138. Variant calling was performed with GATK Haplotype caller in N+1 mode. For calibration of variant quality, a set of trusted SNP and indels was used. For SNPs, the set consisted in SNPs from the BovineHD (Illumina) and Axiom Genome-Wide BOS 1 (Affymetrix) commercial genotyping arrays. For indels, we selected a subset of indels identified in the DAMONA pedigree behaving like true Mendelian variants : presenting no parent-offspring incompatibilities (e.g. opposite homozygotes), no deviation from Hardy-Weinberg proportions (p > 0.05) and no deviation from expected genotypic proportions in offspring of heterozygous parents (p > 0.05). In addition, we compute the probability to observe no parent-offspring inconsistency if parental alleles were drawn at random and conserved only indels with a probability below 1e-12 (to make sure that the absence of parent-offspring incompatibilities was not due by chance).

近交系数（inbreeding coefficient, F）的估计对于近交衰退（inbreeding depression, ID）研究或保育群体管理至关重要。目前已有多种利用遗传标记估计实际近交系数F的方法，但尚无明确的最优选择方案。本研究利用荷斯坦奶牛家系的245个个体的全基因组测序数据，实证评估了哪些估计器能最优捕捉导致近交衰退的变异位点的纯合性——这类变异包括罕见有害等位基因、呈现杂合优势且以中等频率分离的基因座。基于配子结合相关性（correlation between uniting gametes, FUNI）或基因组关系（genomic relationships, FGRM）的估计器与这类变异位点的相关性最高。不过，这类估计器对罕见等位基因的纯合性捕捉效果仍欠佳。第二类估计器包括基于纯合过剩（excess homozygosity, FHOM）、同源纯合片段（homozygous-by-descent segments, FHBD）、连续纯合片段（runs-of-homozygosity, FROH）或已知系谱（known genealogy, FPED）的估计器，它们更擅长捕捉全基因组范围的纯合性，反映了近交对所有变异位点的影响，尤其适用于低至中等频率的年轻等位基因。结果表明，FUNI和FGRM或许与ID的关联性更强。但当隐性有害等位基因频率升高时（例如有效群体规模较小的群体），情况可能有所不同。针对位点特异性近交度量或低标记密度场景，方法的排序也会发生变化：此时FHBD能更好地利用邻近标记的信息。最后，本研究证实，基因组学估计方法整体上优于基于系谱的估计方法；其中FPED与位点特异性纯合性几乎无相关性。 ## 材料与方法 DNA样本采用标准流程从全血或精液中提取。测序工作在Illumina HiSeq 2000平台上完成，采用无PCR建库方法制备插入片段长度为550bp的文库（对应DAMONA家系），采用读长为2×100 bp的双端测序策略。全基因组测序数据的分析遵循GATK最佳实践V3.4标准流程。首先利用BWA MEM（版本0.7.9a-r786，Li 2013）将reads（FASTQ文件）比对至参考基因组（牛UMD 3.1，Bos Taurus UMD 3.1），参数设置为默认值。使用sambamba（v0.4.6）、Picard工具检测排序后BAM文件中的PCR重复序列，再利用bedtools生成文库统计信息与覆盖度数据。随后利用基因组分析工具包（Genome Analysis Toolkit, GATK 2.7.4, DePristo等，2011）对BAM文件进行插入缺失区域重比对与碱基质量重校正。用于校正的已知SNP集来自DBSNP数据库138版本。变异识别采用GATK单体型呼叫器的N+1模式进行。变异质量校正时，使用一组可信的SNP与插入缺失位点集：对于SNP，该集合取自BovineHD（Illumina）与Axiom Genome-Wide BOS 1（Affymetrix）商业基因分型芯片的位点；对于插入缺失，我们从DAMONA家系中筛选出符合孟德尔遗传规律的插入缺失子集：即不存在亲子代纯合子矛盾、哈迪-温伯格平衡偏差不显著（p>0.05）、且杂合亲本的子代基因型比例与预期无显著偏差（p>0.05）。此外，我们计算了“若亲本等位基因随机抽取则未观察到亲子代矛盾”的概率，仅保留概率低于1e-12的插入缺失位点，以确保未观察到亲子代矛盾并非偶然因素导致。

创建时间：

2020-10-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集