Annotation-Weighted and Unweighted Gene-Based Analysis of Rare Germline Variants Associated with Pancreatic Ductal Adenocarcinoma Risk
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14652412
下载链接
链接失效反馈官方服务:
资源简介:
Dataset Description:This dataset contains results from gene-based analyses investigating the role of rare germline variants in pancreatic ductal adenocarcinoma (PDAC) risk. The analyses utilized genotype data from the Pancreatic Cancer Cohort Consortium (PanScan-PanC4, including PanScan I, PanScan II, and PanC4) and the UK Biobank, comprising 14,254 and 11,021 samples, respectively.
Gene Definition and Variant Selection:The analyses included all 19,264 protein-coding genes on autosomal chromosomes, with each gene region defined as spanning 5 kb upstream of the first known exon to 2 kb downstream of the last known exon. Rare variants were defined as those with a minor allele frequency (MAF) between 0.01 and 1×10⁻⁵. Post-quality-controlled files were prepared in PLINK file format by extracting rare variants separately for each dataset (PanScan-PanC4 and UK Biobank).
Annotation-Unweighted Gene-Based Analysis:The MAGMA tool (version 1.10) was used to conduct gene-based analysis for each dataset, applying the --gene-model snp-wise=mean --burden all flag to compute SKAT values with inverse variance weighting. Covariates included age, sex, genotyping arrays, and the top 10 principal components. Results from the PanScan-PanC4 and UK Biobank datasets were combined using a fixed-effect meta-analysis (--meta genes=), and statistical significance was assessed with Bonferroni correction (threshold: 2.59 × 10⁻⁶).
Annotation-Weighted Gene-Based Analysis:Variants were annotated using 11 functional scores (CADD, FATHMM.XF, LINSIGHT, and several aPC categories) retrieved from the FAVOR database via the STAARpipelineSummary R package (version 0.9.7). Weighted burden scores were computed for each gene using MAGMA with the --burden weights flag, and p-values were aggregated using the ACAT method to produce a single p-value per gene. As with the unweighted analysis, meta-analysis between PanScan-PanC4 and UK Biobank results was performed. Variants lacking annotation scores were excluded automatically.
Key Findings:The dataset includes results from both annotation-unweighted and annotation-weighted analyses, identifying genes with significant associations to PDAC risk. Key findings include genes such as RIPK2, ST7L, MED23, PARD3, and RBFOX1 in the unweighted analysis and SEPTIN8, ADH5, and RABGAP1L in the weighted analysis.
创建时间:
2025-01-15



