five

Genome-Wide Pleiotropy Scan Across Multiple Cancers

收藏
NIAID Data Ecosystem2026-04-30 收录
下载链接:
https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002809.v1.p1
下载链接
链接失效反馈
官方服务:
资源简介:
A whole-exome sequencing (WES) study was conducted in 3,233 cases diagnosed with multiple primary cancers and 3,229 matched cancer-free controls (90% non-Hispanic white, 3% African-American, 3% East Asian, and 4% Latino) selected from individuals in the Kaiser Permanente Research Bank (KPRB) who were members of the Kaiser Permanente Northern California (KPNC) health plan. Cancer-free controls were matched to cases on age at specimen collection (within 2 years), sex, genotyping array (which matched on self-reported race/ethnicity), closest distance using the first two principal components for genetic ancestry, and reagent kit. Cases and controls were drawn from two prospective KPRB cohorts: the Research Program on Genes, Environment and Health (RPGEH) and the ProHealth study. Participants were sequenced by the Regeneron Genetics Center using the Illumina NovaSeq 6000 platform, and sample preparation and quality control were performed using a high-throughput, fully-automated system [PMID: 33087929]. Reads were aligned to the GRCh38 reference genome, and variants were called using WeCall [PMID: 33087929]. Participants with sex discordance, 20x coverage at less than 80% of targeted sites, and/or contamination greater than 5% were excluded. After quality control, we retained n = 6,247 (3,111 cases, 3,136 controls) individuals for downstream analyses. Among participants selected for this WES study, n = 5,432 (2,299 cases; 3,133 controls) consented to deposition of data to the National Institutes of Health (NIH).Further quality control was applied to filter low quality variants. Genotype calls with low depth of coverage (DP) were updated to missing (DP < 7 for SNPs and DP < 10 for indels), after which sites with low allele balance (AB) - variants without at least one sample having AB ≥ 15% for SNPs or AB ≥ 20% for indels - were removed. Lastly, variants with missingness > 10% and Hardy-Weinberg equilibrium p-value < 10-15 were excluded. Further description of quality control and downstream single-variant and gene-based analyses is available in Cavazos et al, 2022 [medRxiv].]]> Inclusion criteria for the data deposited in dbGaP include all of the following: Individuals with multiple primary cancers and matched cancer-free controls.Successfully sequenced from extracted DNA.Provided explicit consent to have data deposited in an NIH-maintained database. Exclusion criteria for the data deposited in dbGaP include any of the following: Subject requested withdrawal from study after DNA extraction and sequencing.Validity of link between biospecimen and study participant was questionable because of sex discordance.]]> The Kaiser Permanente RPGEH is a resource that was developed to facilitate research on the role of genetic and environmental factors in a wide variety of common diseases and healthy aging. The RPGEH links data from electronic medical records , survey data on demographic and behavioral factors, environmental data from geographic information system databases, and genetic data derived from biospecimens from participating health plan members.Cases with multiple primary cancers were identified from the KPNC Cancer Registry, as updated through June 2016. The Cancer Registry captures data on all primary cancer cases newly diagnosed or treated at KPNC facilities. It conforms to standards of the North American Association of Central Cancer Registries and the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) program. Controls were individuals without any cancer diagnosis through June 2016. Our WES study of rare and common variants included 3,111 cases and 3,136 controls. Among these individuals, n=5,432 consented to deposition of data to the NIH.]]>
创建时间:
2022-02-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作