five

Data for hapROH

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4992531
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains data used in the publication describing the software hapROH that is available as Python package.   1) Processed 1000 genome reference data: ./1000g1240khdf5.tar.gz This zipped folder contains the 1000 genome reference haplotype panel, downsampled to bi-allelic 1240K SNPs and transformed to .hdf5 format. This data was used throughout the analysis of human ancient DNA data. When unzipped, this filed contains 1A) a folder for all 22 autosomes (./all1240) which itself contains a .hdf5 file for each human chromosome as well as a metadata table required for hapROH 1B) a folder (./chX1240) which contains the same reference data for X chromsome specific analysis.   2) aDNA dataset used to call ROH: ./v42.4.1240K.tar This .tar ball contains the ancient genetic data used for calling ROH (pseudohaploid and diploid genetic data in Eigenstrat format). It is a direct copy from the Allen Ancient DNA Resource compiled and shared publicly by the Reich lab - downloaded from https://reich.hms.harvard.edu in April 2020. This is the version V42.4, which was released in March 2020. The Reich lab is preparing a data publication for the Allen Ancient DNA Resource, all citations should refer to this publication. This here is a "freeze" for reproducibiliy reasons and long-term storage.   3) Sardinian aDNA dataset + Human Origins data: ./marcus_et_al2020.hdf5 and ./marcus2020_meta.tsv This .hdf5 file contains genetic data for n=3039 individuals. It is the source file for Marcus et al 2020 (https://doi.org/10.1038/s41467-020-14523-6), and the data was processed as outlined in this publication. There is data for the newly reported ancient Sardinians (n=85), other publicly available ancient individuals (n=1013) as well as genotypes of the freely available modern Human Origins individuals (n=1941). For ancient data, read-count data is available in the hdf5 field "calldata/AD", pseudo-haploid and diploid genotype data (only for moderns) in "calldata/GT". Metadata that matches the order in the .hdf5 file is reported in "./marcus2020_meta.tsv", a tab seperated text file.   4) ROH calls for each individual in manuscript: ./roh.calls.global_data.tsv This tab-seperated table (hence .tsv) contains ROH calls for each of the XX ancient and YY modern indiviuals screened for ROH in the manuscript. Each row is one unique individuals, and the columns report ROH summary statistics (e.g. longest inferred ROH, sum ROH in Morgan length bins) as well as relevant individuals meta data (e.g. latitude, longitude, mean age estimate, average depth on 1240K/HO markers).   5) Bar Figures of ROH in modern samples grouped by label: ./roh_bars_modern.zip   Each plot visualizes ROH calls of all modern individuals. Every bar symbolizes one individuals, broken up into four length categories (color-coded).   6) Bar Figures of ROH in ancient individuals grouped by geographic transect: ./roh_bars_ancient_transects.zip Each plot visualizes ROH calls of all ancient individuals within an geographic transect (as defined in the paper, and reported in ./roh.calls.global_data.tsv). Every bar symbolizes one ancient individuals, broken up into four length categories (color-coded).
创建时间:
2021-07-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作