Data for Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://zenodo.org/record/3570311

下载链接

链接失效反馈

官方服务：

资源简介：

Description of the datasets Data are organized as folders and compressed with tar.gz. There are two compressed data folder: data which used for cattle genome graphs experiment and data_human which we used for human genome graphs experiment. Cattle genome graphs experiments First you need to unzip the file using command tar -xvzf data.tar.gz. After unzipping, the data folder is organized as follows: Utilities: contain bovine ARS-UCD 1.2 fasta reference with the accompanying index. Bin: contain the softwares used in the paper (vg, liftover, vcf2diploid) Part1: data for analysis in variant prioritization section, further subdivided into: vcf_sim: variant files from four animal in each breed used to simulate reads reads_sim: simulated short reads used for read mapping vcf_freq: variants augmented to graphs filtered based on allele frequency Part2: data used for analysis in the section of graph mapping with breeds-filtered variants, further subdivided into: vcf_breed: variant files used to graphs construction. Part3: data used for analysis in the section of consensus genome, further subdivided into: read_sims: simulated reads as in the part1, but the coordinates are liftovered to the new consensus genomes. reference: contain the original reference and consensus references. vcf_consensus: contain major allele variants to construct consensus genomes. Part4: data analysis in the section of whole genome graph construction and variant genotyping. vcf_construct: variants from chromosome 1-29 from 82 Brown Swiss used to construct BSW whole genome graph. BSW_graph: whole genome Brown Swiss graph with the three accompanying indexes (xg,gcsa, and gbwt). Human genome graphs experiments First you need to unzip the data_human file using command tar -xvzf data_hum.tar.gz. After unzipping, the data folder is organized as follows: reference: the g1k_v37 reference used as a graph backbone vcf_sim: variant files from four individuals in each population used to simulate reads reads_sim: simulated short reads used for read mapping vcf_freq: variants augmented to graphs filtered based on allele frequency

创建时间：

2020-04-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集