Data for Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3570311
下载链接
链接失效反馈官方服务:
资源简介:
Description of the datasets
Data are organized as folders and compressed with tar.gz.
There are two compressed data folder: data which used for cattle genome graphs experiment and data_human which we used for human genome graphs experiment.
Cattle genome graphs experiments
First you need to unzip the file using command tar -xvzf data.tar.gz. After unzipping, the data folder is organized as follows:
Utilities: contain bovine ARS-UCD 1.2 fasta reference with the accompanying index.
Bin: contain the softwares used in the paper (vg, liftover, vcf2diploid)
Part1: data for analysis in variant prioritization section, further subdivided into:
vcf_sim: variant files from four animal in each breed used to simulate reads
reads_sim: simulated short reads used for read mapping
vcf_freq: variants augmented to graphs filtered based on allele frequency
Part2: data used for analysis in the section of graph mapping with breeds-filtered variants, further subdivided into:
vcf_breed: variant files used to graphs construction.
Part3: data used for analysis in the section of consensus genome, further subdivided into:
read_sims: simulated reads as in the part1, but the coordinates are liftovered to the new consensus genomes.
reference: contain the original reference and consensus references.
vcf_consensus: contain major allele variants to construct consensus genomes.
Part4: data analysis in the section of whole genome graph construction and variant genotyping.
vcf_construct: variants from chromosome 1-29 from 82 Brown Swiss used to construct BSW whole genome graph.
BSW_graph: whole genome Brown Swiss graph with the three accompanying indexes (xg,gcsa, and gbwt).
Human genome graphs experiments
First you need to unzip the data_human file using command tar -xvzf data_hum.tar.gz. After unzipping, the data folder is organized as follows:
reference: the g1k_v37 reference used as a graph backbone
vcf_sim: variant files from four individuals in each population used to simulate reads
reads_sim: simulated short reads used for read mapping
vcf_freq: variants augmented to graphs filtered based on allele frequency
创建时间:
2020-04-23



