Data from: Comparative Performance of Popular Methods for Hybrid Detection using Genomic Data
收藏DataCite Commons2026-03-04 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.6t1g1jwwv
下载链接
链接失效反馈官方服务:
资源简介:
Interspecific hybridization is an important evolutionary phenomenon that
generates genetic variability in a population and fosters species
diversity in nature. The availability of large genome scale datasets has
revolutionized hybridization studies to shift from the examination of the
presence or absence of hybrids in nature to the investigation of the
genomic constitution of hybrids and their genome-specific evolutionary
dynamics. Although a handful of methods have been proposed in an attempt
to identify hybrids, accurate detection of hybridization from genomic data
remains a challenging task. The available methods can be classified
broadly as site pattern frequency based and population genetic clustering
approaches, though the performance of the two classes of methods under
different hybridization scenarios has not been extensively examined. Here,
we use simulated data to comparatively evaluate the performance of four
tools that are commonly used to infer hybridization events: the site
pattern frequency based methods HyDe and the D-statistic (i.e.,
the ABBA-BABA test), and the population clustering approaches
structure and ADMIXTURE. We consider single hybridization
scenarios that vary in the time of hybridization and the amount of
incomplete lineage sorting (ILS) for different proportions of parental
contributions (γ); introgressive hybridization; multiple hybridization
scenarios; and a mixture of ancestral and recent hybridization scenarios.
We focus on the statistical power to detect hybridization, the false
discovery rate (FDR) for the D-statistic and HyDe, and the accuracy of the
estimates of γ as measured by the mean squared error for HyDe, structure,
and ADMIXTURE. Both HyDe and the D-statistic demonstrate a high
level of detection power in all scenarios except those with high ILS,
although the D-statistic often has an unacceptably high FDR. The estimates
of γ in HyDe are impressively robust and accurate whereas
structure and ADMIXTURE sometimes fail to identify hybrids,
particularly when the proportional parental contributions are asymmetric
(i.e., when γ is close to 0). Moreover, the posterior distribution
estimated using structure exhibits multimodality in many
scenarios, making interpretation difficult. Our results provide guidance
in selecting appropriate methods for identifying hybrid populations from
genomic data.
提供机构:
Dryad
创建时间:
2021-04-09



