five

Processed Whole-Genome Variant Data (VCF) and Sample ID Lists for African and European Ancestry Cohorts

收藏
Figshare2026-03-12 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_b_Processed_Whole-Genome_Variant_Data_VCF_and_Sample_ID_Lists_for_African_and_European_Ancestry_Cohorts_b_/31568572
下载链接
链接失效反馈
官方服务:
资源简介:
README – Processed Whole-Genome Variant DatasetThis dataset contains processed variant-level information derived from whole-genome sequencing (WGS) data generated from participants of the Louisiana Osteoporosis Study (LOS). The data are organized for multi-ancestry genomic analyses and include variant location and allele information for two ancestry groups: African ancestry (AFA) and European ancestry (CAU).Genomic DNA was extracted from peripheral blood samples and sequenced on the DNBSEQ-500 platform (BGI Americas) using 150-bp paired-end reads. Sequencing reads were aligned to the human reference genome GRCh38/hg38 using Burrows-Wheeler Aligner (BWA v0.7.12). Variant discovery followed the Genome Analysis Toolkit (GATK v4.0.3) Best Practices workflow with HaplotypeCaller, and variant quality score recalibration (VQSR) was applied to obtain high-confidence variants.Genotype data processing and quality control followed the GoDMC pipeline. Quality control was applied to autosomal chromosomes (1–22). Variants failing Hardy–Weinberg equilibrium (P 0.2. Individuals deviating more than 7 standard deviations from the mean along any principal component were considered ancestry outliers and removed. Principal components were recalculated after outlier removal.After quality control, 7,803,505 variants remained in the African ancestry cohort and 5,792,887 variants remained in the European ancestry cohort.The files distributed in this dataset contain variant-level genomic coordinates and allele information derived from the quality-controlled dataset and are intended for genomic annotation, reference panel harmonization, variant catalog comparison, and other population-level analyses.Files IncludedAFA_chr_all.vcf.gzVariant-level VCF file containing genomic positions and allele information for variants identified in individuals of African ancestry.CAU_chr_all.vcf.gzVariant-level VCF file containing genomic positions and allele information for variants identified in individuals of European ancestry.intersect_ids_AA.txtSample identifiers retained after quality control and cross-dataset intersection for the African ancestry cohort.intersect_ids_EA.txtSample identifiers retained after quality control and cross-dataset intersection for the European ancestry cohort.
创建时间:
2026-03-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作