Archaic introgression for HGDP and 1000genomes in hg38
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14136627
下载链接
链接失效反馈官方服务:
资源简介:
These files contain the infered positions of introgressed archaic sequence in 1000genomes and HGDP datasets.
The segments are identified using hmmix (https://github.com/LauritsSkov/Introgression-detection). Datasets are phased so segments are infered for each haplotype. Each datasets will have two files: a *segments.txt file and a *SNPS.txt file.
> The columns in the segments.txt file are:
name: name of individidualhaplotype: either hap1 or hap2 (if a genotype is 0|1 then 0 will be on hap1 and 1 will be on hap2)pop: population from HGDP or 1000 genomesregion: region from HGDP or 1000 genomes - can be AMERICA, CENTRAL_SOUTH_ASIA, EAST_ASIA, EUROPE, MIDDLE_EAST or OCEANIAchrom: chromosome in hg38 - X chromosome is not includedstart: start coordinate of introgressed segment in hg38end: end coordinate of introgressed segment in hg38mean_prob: Mean posterior probability that a segment is archaic according to hmmix (I usually recommend doing a cutoff at 0.8)ND_type: Which sequenced archaic does the segments share more derived SNPs with. Can be Both, Denisova, Neanderthal or nonesnps: Number of derived SNPs on segment NOT seen in Sub saharan Africaadmixpopvariants: How many derived SNPs are shared with a sequenced arhaic genomeAltai: How many derived SNPs are shared with the Altai Neanderthal (Denisova5)Vindija: How many derived SNPs are shared with Vindija Neanderthal (Vindija33.19)Denisova: How many derived SNPs are shared with Denisova (Denisova3)Chagyrskaya: How many derived SNPs are shared with Chagyrskaya Neanderthal (Chagyrskaya8)variants: List of derived SNPs on segment NOT seen in Sub saharan Africa
> The columns in the SNPS.txt file are:
chrom: chromosome in hg38 - X chromosome is not includedpos: position of SNP in hg38 coordinatessnptype: can be shared derived with archaic (DAV), in high LD ancestralbase: what is the ancestral basederivedbases: What is the derived base (there can be multiple but >99% are bilallelic)freq_in_dataset: Frequency of most common derived base (in percent so the number is between 0 and 100)ND: derived in either Denisovans only (ND01), Neanderthals only (ND10), derived in both Neanderthals and Denisovans (ND11) or nonesharedwith: Which archaic genomes is the derived allele(s) shared with. This does not only include the four high coverage archaics
创建时间:
2024-11-13



