Jointly representing long-range genetic similarity and spatially heterogeneous isolation-by-distance

NIAID Data Ecosystem2026-05-02 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.p8cz8wb18

下载链接

链接失效反馈

官方服务：

资源简介：

Isolation-by-distance patterns in genetic variation are a widespread feature of the geographic structure of genetic variation in many species, and many methods have been developed to illuminate such patterns in genetic data. However, long-range genetic similarities also exist, often as a result of rare or episodic long-range gene flow. Jointly characterizing patterns of isolation-by-distance and long-range genetic similarity in genetic data is an open data analysis challenge that, if resolved, could help produce more complete representations of the geographic structure of genetic data in any given species. Here, we present a computationally tractable method that identifies long-range genetic similarities in a background of spatially heterogeneous isolation-by-distance variation. The method uses a coalescent-based framework, and models long-range genetic similarity in terms of directional events with source fractions describing the fraction of ancestry at a location tracing back to a remote source. The method produces geographic maps annotated with inferred long-range edges, as well as maps of uncertainty in the geographic location of each source of long-range gene flow. We have implemented the method in a package called FEEMSmix (an extension to FEEMS from Marcus et al 2021), and validated its implementation using simulations representative of typical data applications. We also apply this method to two empirical data sets. In a data set of over 4,000 humans (Homo sapiens) across Afro-Eurasia, we recover many known signals of long-distance dispersal from recent centuries. Similarly, in a data set of over 100 gray wolves (Canis lupus) across North America, we identify several previously unknown long-range connections, some of which were attributable to recording errors in sampling locations. Therefore, beyond identifying genuine long-range dispersals, our approach also serves as a useful tool for quality control in spatial genetic studies. Methods The wolf data set (wolvesadmix_corrected) consists of 108 individuals and 17,729 SNPs. For this study, we correct the locations of two individuals based on an analysis of the sample meta data and remove three individuals with ambiguous locations from the original data set of 111 wolves compiled in Schweizer et al 2016 (data available here:https://doi.org/10.5061/dryad.p8cz8wb18). The human data set (c1global1nfd_public) consists of 4,070 individuals and 19,954 SNPs. For this study, we subset to individuals with public sharing permissions from the larger data set of 4,697 individuals in Peter et al 2020. (data available on Zenodo as 'Supplemental information').

遗传变异中的距离隔离（Isolation-by-distance）模式是众多物种种群遗传变异地理结构的普遍特征，学界已开发出多种方法以阐释遗传数据中的此类模式。然而，远距离遗传相似性同样广泛存在，其往往源于罕见或偶发的远距离基因流。同时刻画遗传数据中的距离隔离模式与远距离遗传相似性，是一项尚未解决的数据分析难题；若能攻克该难题，将有助于更完整地呈现任意物种种群遗传数据的地理结构。本文提出一种计算上可行的方法，可在空间异质性的距离隔离变异背景下识别远距离遗传相似性。该方法基于溯祖理论框架，以定向事件为模型，通过来源占比描述某一地点的祖先组分追溯至遥远来源的比例。该方法可生成两类地理图谱：一类标注有推断出的远距离连接边，另一类则展示各远距离基因流来源的地理位置不确定性。我们已将该方法实现为名为FEEMSmix的工具包（是Marcus等人2021年发布的FEEMS的扩展版本），并通过模拟典型数据应用场景的实验验证了该实现的有效性。我们还将该方法应用于两个实证数据集。在覆盖非-欧亚大陆的4000余名智人（Homo sapiens）样本数据集中，我们复现了近几个世纪以来诸多已知的长距离扩散信号。同样地，在覆盖北美地区的100余只灰狼（Canis lupus）样本数据集中，我们识别出数项此前未被发现的远距离关联，其中部分关联可归因于采样位置的记录误差。因此，本方法除了可识别真实的长距离扩散事件外，还可作为空间遗传学研究中质量控制的实用工具。方法灰狼数据集（wolvesadmix_corrected）包含108个个体与17729个单核苷酸多态性位点（Single Nucleotide Polymorphisms, SNPs）。本研究基于样本元数据分析修正了2个个体的采样位置，并从Schweizer等人2016年编译的原始111只灰狼数据集（数据链接：https://doi.org/10.5061/dryad.p8cz8wb18）中移除了3个位置信息模糊的个体。智人数据集（c1global1nfd_public）包含4070个个体与19954个单核苷酸多态性位点。本研究从Peter等人2020年的4697个个体的更大规模数据集中，筛选出具备公开共享权限的个体作为本次研究的样本（数据集可在Zenodo平台以"Supplemental information"名称获取）。

创建时间：

2025-03-12