Inferring whole-genome histories in large population datasets: inferred tree sequences for Simons Genome Diversity Project
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3052358
下载链接
链接失效反馈官方服务:
资源简介:
Tree sequences inferred for the SGDP autosomes using tsinfer version 0.1.4 and compressed using tszip. Tree sequences can be decompressed as follows:
$ tsunzip sgdp_chr1.trees.tsz
Once decompressed, trees files can be loaded and processed using tskit.
import tskit
ts = tskit.load("sgdp_chr1.trees")
# ts is an instance of tskit.TreeSequence
print("Chromosome 1 contains {} trees".format(ts.num_trees))
Metadata associated with individuals and populations was derived from the original source and converted to JSON form. For example, to access individual metadata we can use:
import tskit
import json
ts = tskit.load("sgdp_chr1.trees")
ind = ts.individual(0)
metadata_dict = json.loads(ind.metadata)
The metadata_dict variable will now contain all the metadata for the individual with ID 0 as a dictionary. Metadata associated with populations can be found in a similar way. Population IDs are associated with individuals via their constituent nodes. For example,
pop_metadata = [json.loads(pop.metadata) for pop in ts.populations()]
ind_node = ts.node(ind.nodes[0])
ind_pop_metadata = pop_metadata[ind_node.population]
After this, the ind_pop_metadata variable will contain the population level metadata for individual ID 0.
The full data pipeline used to generate these tree sequences and associated metadata is available on GitHub.
创建时间:
2020-01-24



