five

Alzheimer's Disease Sequencing Project (ADSP)

收藏
NIAID Data Ecosystem2026-04-25 收录
下载链接:
https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000572.v8.p4
下载链接
链接失效反馈
官方服务:
资源简介:
LOCATION CHANGE FOR ALZHEIMER'S DISEASE SEQUENCING PROJECT (ADSP) DATA: Please go to NIAGADS DSS to apply for build 38 ADSP genetic and phenotypic data. See Background below for more details. For instructions on how to access the additional ADSP data that are shared through NIAGADS DSS, visit the Application Instructions page. Background: Additional sequencing data are continuously being generated by the ADSP. These data are mapped to the latest Genome Reference Consortium human genome build GRCh38 (hg38) and are being shared through the NIA Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) Data Sharing Service (DSS). As of May 1, 2020 there are 4,789 whole genomes and 19,922 whole exomes available to the research community. Later in 2020 there will be a total of ~17,000 whole genomes and 19,922 whole exomes available through NIAGADS DSS (ng00067). The total number of genomes from multi-ethnic cohorts is anticipated to exceed 50,000. Please see the ADSP Design page for the complete study description. ADSP whole exome and whole genome sequence data that were shared through dbGaP were mapped to the GRCh37 (build 37). These data are from the Discovery Phase of the project (described below) and will continue to be available at this site. STUDY DESCRIPTION FOR dbGaP BUILD 37 ADSP DATA: The overarching goals of the Alzheimer's Disease Sequencing Project (ADSP) are to: (1) identify new genomic variants contributing to increased risk of developing Alzheimer's Disease (AD), (2) identify new genomic variants contributing to protection against developing AD, and (3) provide insight as to why individuals with known risk factor variants escape from developing AD. These factors will be studied in multi-ethnic populations in order to identify new pathways for disease prevention. Such a study of human genomic variation and its relationship to health and disease requires examination of a large number of study participants and needs to capture information about common and rare variants (both single nucleotide and copy number) in well phenotyped individuals. Using existing samples from NIH funded and other studies, three NHGRI funded Large Scale Sequencing and Analysis Centers (LSAC) - Broad, Baylor, and Washington University - produced the DNA sequence data. Variant call data are being made available to the scientific community through NIH-approved data repositories. Statistical analysis of the sequence data is anticipated to identify new genetic risk and protective factors. The ADSP will conduct and facilitate analysis of sequence data to extend previous discoveries that may ultimately result in new directions for AD therapeutics. Analysis of ADSP data will be done in two phases. The Discovery Phase analysis (2014-2018) is funded under PAR-12-183. The entire Discovery dataset contains whole-genome sequencing data on 584 subjects from 113 families, and pedigree data for > 4000 subjects; whole exome sequencing data on 5096 cases 4965 controls; and whole exome sequence data on an additional 853 (682 Cases [510 Non-Hispanic, 172 Hispanic]), and 171 Hispanic Control subjects from families that are multiply affected with AD. The Replication Phase (2016-2021) analysis will be funded under RFA-AG-16-001 and RFA-AG-16-002 and is expected to include a combination of genotyping and sequencing approaches on at least 30,000 subjects. Targeted sequencing will be done by the LSACs. GRCh37 Data Releases The first ADSP data release occurred on November 25, 2013. It included the whole-genome sequencing data in BAM file format on 410 individuals. The second ADSP data release occurred on March 31, 2014, and included the whole-genome sequencing data in BAM file format for an additional 168 individuals. The third ADSP data release occurred on November 03, 2014 and included whole-exome sequencing data in BAM file format for 10,939 individuals. The fourth ADSP data release occurred on February 13, 2015 and included revised ethnic data for subjects with whole-exome sequencing data. The fifth ADSP data release occurred on July 13, 2015 and included whole-genome genotypes and updated phenotypes as well as changes to pedigree structures and sample IDs. The sixth ADSP data release occurred on December 8, 2015, and included whole-exome genotypes and updated phenotypes as well as changes to subject IDs. This seventh ADSP data release on April 12, 2016 includes: (1) WES and WGS SNV VCF files (2) WES and WGS Indel PLINK files ADSP Data Available through dbGaP: ADSP - Whole Genome Sequencing ADSP - Whole Exome Sequencing Comments DNA-Seq (BAM) n=578 n=10913 Sequence data available (plus n=38 replications w/out genotype data) Concordant SNV Genotypes (PLINK format) N/A n=10913 QC'ed genotypes that are concordant between the Atlas (Baylor's) and GATK (Broad's) calling pipelines (a subset of the consensus genotype set) Consensus Genotypes (PLINK and VCF format) n=578 n=10913 QC'ed genotypes that are concordant between Atlas and GATK pipelines as well as those that that were called uniquely by Atlas or GATK Concordant Indel Genotypes (PLINK format) n=578 n=10913 QC'ed genotypes that are concordant between the Atlas and GATK calling pipelines Phenotype Data n=4735 n=10913 Data of n=53 phenotype variables available (plus administrative data), including APOE genotype. WGS phenotypes include data of connecting family members. Please use the release notes provided by dbGaP to obtain detailed information about study release updates. The ADSP data portal provides a customized interface for users to quickly identify and retrieve files by covariates, phenotypes, and data properties such as sequencing facility or coverage. For more information about the ADSP study and the data portal, please visit https://www.niagads.org/adsp/.]]> The samples for the ADSP have been selected from well-characterized cohorts of individuals characterized for AD diagnosis as well as having known AD genetic risk factors. Investigators in the ADSP will obtain from the NIH approved data repositories: (1) quality control checked and 'cleaned' sequence data. 'Quality control checked and cleaned' means a set of routine checks have been performed for sample information, phenotype, and GWAS data to ensure the sequence data are of high quality and are ready for downstream genetic analysis and that likely sources of false-positives have been ruled out, and that samples that are outliers which may skew project-level analyses have been identified; (2) information on the composition of the study cohorts (e.g. case-control, family based, and epidemiology cohorts); (3) descriptions of the study cohorts included in the study; and (4) accompanying phenotypic information such as age at disease onset, self-reported race/ethnicity, gender, diagnostic status, and cognitive measures. The ADSP will determine what additional information, if any, is needed by its members to facilitate the project.]]> On February 7, 2012, a new Presidential Initiative was announced to fight Alzheimer's Disease (AD). As part of this effort, the National Human Genome Research Institute (NHGRI) was asked by the Director of the National Institutes of Health (NIH) to use $25M already committed to its Large-Scale Sequencing Centers (LSSC) for genomic studies in AD. The NIH director asked the National Institute on Aging (NIA) and the NHGRI to work together to develop and execute a large scale sequencing project to analyze the genomes of a large number of well characterized individuals in order to identify a broad range of AD risk and protective gene variants, with the ultimate goal of facilitating the identification of new pathways for therapeutic approaches and prevention. The analysis will also provide insight as to why individuals with known risk factor genes escape from developing AD. The project, developed jointly by NIA and NHGRI, is called the Alzheimer's Disease Sequencing Project (ADSP).]]>

阿尔茨海默病测序项目(Alzheimer's Disease Sequencing Project, ADSP)数据的位置变更:请前往NIAGADS数据共享服务(Data Sharing Service, DSS)申请获取基因组组装版本38(build 38)的ADSP遗传与表型数据。如需了解更多详情,请参阅下文背景部分。如需查看通过NIAGADS DSS共享的额外ADSP数据的获取指南,请访问“申请说明”页面。 背景:ADSP持续生成新增测序数据。这些数据已比对至最新的基因组参考联盟(Genome Reference Consortium, GRC)人类基因组组装版本GRCh38(hg38),并通过美国国家衰老研究所(National Institute on Aging, NIA)阿尔茨海默病遗传学数据存储站点(NIAGADS)的数据共享服务(DSS)进行共享。截至2020年5月1日,研究社群可获取4789份全基因组数据与19922份全外显子组数据。2020年下半年,通过NIAGADS DSS(项目编号ng00067)可获取的全基因组数据总量将达到约17000份,全外显子组数据总量维持19922份。多民族队列的基因组数据总量预计将突破50000份。如需完整的项目研究描述,请参阅ADSP研究设计页面。 通过基因型与表型数据库(dbGaP)共享的ADSP全外显子组与全基因组测序数据比对至GRCh37(build 37)。此类数据来自项目的发现阶段,详见下文,并将继续在该站点开放获取。 dbGaP发布的build 37版本ADSP数据的研究描述:阿尔茨海默病测序项目(ADSP)的总体目标为:(1)识别可增加阿尔茨海默病(Alzheimer's Disease, AD)发病风险的新型基因组变异;(2)识别可抵御AD发病的新型基因组变异;(3)阐明携带已知风险因子变异的个体为何仍能免于AD发病。本研究将在多民族人群中开展相关分析,以期发现疾病预防的全新通路。 针对人类基因组变异及其与健康和疾病的关联研究,需要对大量研究参与者进行检测,并需在表型表征良好的个体中捕获常见与罕见变异,包括单核苷酸变异与拷贝数变异的相关信息。 依托美国国立卫生研究院(National Institutes of Health, NIH)资助及其他研究的现有样本,由美国国家人类基因组研究所(National Human Genome Research Institute, NHGRI)资助的三家大规模测序与分析中心(Large Scale Sequencing and Analysis Centers, LSAC)——博德研究所(Broad)、贝勒医学院(Baylor)与华盛顿大学——生成了DNA测序数据。变异识别数据将通过经NIH批准的数据存储库向科学界开放。对测序数据的统计分析有望识别新型遗传风险与保护因子。ADSP将开展并推动测序数据分析,以拓展既往研究发现,最终为AD治疗手段开辟新方向。 ADSP数据分析分为两个阶段: 1. 发现阶段(2014-2018):该阶段的分析由PAR-12-183项目资助。该阶段的完整数据集包含:来自113个家庭的584名受试者的全基因组测序数据,以及超过4000名受试者的家系数据;5096例病例与4965例对照的全外显子组测序数据;另外还有853名受试者,其中682例病例,包括510名非西班牙裔、172名西班牙裔,与171名西班牙裔对照受试者的全外显子组测序数据,此类受试者均来自多发AD家庭。 2. 复制阶段(2016-2021):该阶段的分析将由RFA-AG-16-001与RFA-AG-16-002项目资助,预计将对至少30000名受试者采用基因分型与测序相结合的研究方法,相关靶向测序将由LSAC完成。 GRCh37版本数据发布情况: ADSP首次数据发布于2013年11月25日,包含410名受试者的BAM格式全基因组测序数据。 第二次数据发布于2014年3月31日,新增168名受试者的BAM格式全基因组测序数据。 第三次数据发布于2014年11月3日,包含10939名受试者的BAM格式全外显子组测序数据。 第四次数据发布于2015年2月13日,更新了携带全外显子组测序数据受试者的种族信息。 第五次数据发布于2015年7月13日,包含全基因组基因型与更新后的表型数据,同时修订了家系结构与样本ID信息。 第六次数据发布于2015年12月8日,包含全外显子组基因型与更新后的表型数据,同时修订了受试者ID信息。 第七次ADSP数据发布于2016年4月12日,包含:(1)全外显子组测序(WES)与全基因组测序(WGS)的单核苷酸变异(SNV)VCF格式文件;(2)WES与WGS的插入缺失(Indel)PLINK格式文件。 通过dbGaP可获取的ADSP数据: 1. DNA测序(BAM格式):全基因组测序组样本量n=578,全外显子组测序组n=10913;另有38份无基因型数据的重复样本可获取测序数据。 2. 一致性SNV基因型(PLINK格式):全基因组测序组无可用数据,全外显子组测序组n=10913;为经质量控制(QC)后的基因型,由Atlas(贝勒团队)与GATK(博德团队)的变异识别流程得到的一致基因型集的子集。 3. 一致性基因型(PLINK与VCF格式):全基因组测序组n=578,全外显子组测序组n=10913;为经QC后的基因型,包含Atlas与GATK流程一致的结果,以及仅由Atlas或GATK单独识别的结果。 4. 一致性Indel基因型(PLINK格式):全基因组测序组n=578,全外显子组测序组n=10913;为经QC后的基因型,由Atlas与GATK变异识别流程得到的一致结果。 5. 表型数据:全基因组测序组n=4735,全外显子组测序组n=10913;包含53项表型变量及行政数据,其中包含载脂蛋白E(APOE)基因型。全基因组测序的表型数据包含家系关联成员的信息。 请使用dbGaP提供的版本说明获取研究更新的详细信息。ADSP数据门户为用户提供定制化界面,可通过协变量、表型与数据属性,如测序设施或测序深度,快速检索文件。如需了解ADSP研究与数据门户的更多信息,请访问https://www.niagads.org/adsp/。 ADSP的样本选自经过充分表型表征的队列,这些受试者均经过AD诊断评估,并携带已知的AD遗传风险因子。ADSP的研究者将从经NIH批准的数据存储库获取以下内容:(1)经过质量控制与“清洗”后的测序数据。“质量控制与清洗”指已对样本信息、表型数据与全基因组关联研究(Genome-Wide Association Study, GWAS)数据完成常规检查,以确保测序数据质量合格,可直接用于下游遗传分析,排除假阳性结果的潜在来源,并识别可能偏移项目级分析的异常样本;(2)研究队列的组成信息,如病例-对照队列、家系队列与流行病学队列;(3)研究纳入的队列描述;(4)配套的表型信息,如发病年龄、自我报告的种族/族裔、性别、诊断状态与认知测量结果。ADSP将评估其成员所需的额外信息,如有,以推动项目开展。 2012年2月7日,美国宣布新的总统行动计划以对抗阿尔茨海默病(AD)。作为该计划的一部分,美国国立卫生研究院(NIH)主任委托美国国家人类基因组研究所(NHGRI),将已拨付的2500万美元用于其大规模测序中心(Large-Scale Sequencing Centers, LSSC)的AD基因组研究。NIH主任要求美国国家衰老研究所(NIA)与NHGRI合作,开发并实施大规模测序项目,对大量经过充分表型表征的个体基因组进行分析,以识别广泛的AD风险与保护基因变异,最终目标是为治疗方法与预防策略开辟新通路。该分析还将阐明携带已知风险因子基因的个体为何仍能免于AD发病。该由NIA与NHGRI联合开发的项目,即阿尔茨海默病测序项目(ADSP)。
创建时间:
2020-05-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作