five

Data for Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3570311
下载链接
链接失效反馈
官方服务:
资源简介:
Description of the datasets Data are organized as folders and compressed with tar.gz. There are two compressed data folder: data which used for cattle genome graphs experiment and data_human which we used for human genome graphs experiment.  Cattle genome graphs experiments First you need to unzip the file using command tar -xvzf data.tar.gz. After unzipping, the data folder is organized as follows: Utilities: contain bovine ARS-UCD 1.2 fasta reference with the accompanying index. Bin: contain the softwares used in the paper (vg, liftover, vcf2diploid) Part1: data for analysis in variant prioritization section, further subdivided into: vcf_sim: variant files from four animal in each breed used to simulate reads reads_sim: simulated short reads used for read mapping vcf_freq: variants augmented to graphs filtered based on allele frequency Part2: data used for analysis in the section of graph mapping with breeds-filtered variants, further subdivided into: vcf_breed: variant files used to graphs construction. Part3: data used for analysis in the section of consensus genome, further subdivided into: read_sims: simulated reads as in the part1, but the coordinates are liftovered to the new consensus genomes. reference: contain the original reference and consensus references. vcf_consensus: contain major allele variants to construct consensus genomes. Part4: data analysis in the section of whole genome graph construction and variant genotyping. vcf_construct: variants from chromosome 1-29 from 82 Brown Swiss used to construct BSW whole genome graph. BSW_graph: whole genome Brown Swiss graph with the three accompanying indexes (xg,gcsa, and gbwt). Human genome graphs experiments First you need to unzip the data_human file using command tar -xvzf data_hum.tar.gz. After unzipping, the data folder is organized as follows: reference: the g1k_v37 reference used as a graph backbone vcf_sim: variant files from four individuals in each population used to simulate reads reads_sim: simulated short reads used for read mapping vcf_freq: variants augmented to graphs filtered based on allele frequency
创建时间:
2020-04-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作