five

Signals interpreted as archaic introgression are driven primarily by accelerated evolution in Africa

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.2fqz612kn
下载链接
链接失效反馈
官方服务:
资源简介:
Non-African humans appear to carry a few percent archaic DNA due to ancient inter-breeding.  This modest legacy and its likely recent timing imply that most introgressed fragments will be rare and hence will occur mainly in the heterozygous state.  I tested this prediction by calculating D statistics, a measure of legacy size, for pairs of humans where one of the pair was conditioned always to be either homozygous or heterozygous.  Using coalescent simulations, I confirmed that conditioning the non-African to be heterozygous increased D while conditioning the non-African to be homozygous reduced D to zero.  Repeating with real data reveals the exact opposite pattern.  In African – non-African comparisons, D is near-zero if the African individual is held homozygous.  Conditioning one of two Africans to be either homozygous or heterozygous invariably generates large values of D, even when both individuals are drawn from the same population.  Invariably, the African with more heterozygous sites (conditioned heterozygous > unconditioned > conditioned homozygous) appears less related to the archaic.  In contrast, the same analysis applied to pairs of non-Africans always yields near-zero D, showing that conditioning does not create large D without an underlying signal to expose.  Large D values in humans are therefore driven almost entirely by heterozygous sites in Africans acting to increase divergence from related taxa such as Neanderthals.  In comparison with heterozygous Africans, individuals that lack African heterozygous sites, whether non-African or conditioned homozygous African, always appear more similar to archaic outgroups, a signal previously interpreted as evidence for introgression. I hope these analyses will encourage others to consider increased divergence as well as increased similarity to archaics as mechanisms capable of driving asymmetrical base-sharing. Methods These data are parsed versions of data derived from publicly available data.  For the Altai Neanderthal, I used simple C++ scripts applied to data available at http://cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/VCF/ to extract the minimal information required for my analyses: base location, human reference base and the four base counts, A, C, G , T in that order.  For the chimpanzee-human alignment, I downloaded raw alignments from http://www.ensembl.org/info/data/ftp/ and then extracted bases and aligned these to the human reference sequence hs37d5 used by the 1000 genomes project.  I excluded sites within 30 of either end of a contig.  Since most contigs are many tens of Kb long, the loss of information is minimal but at the same time, possible issues with alignment edge effects should be avoided.
创建时间:
2020-08-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作