five

Supporting data for "BIGwas - Single-command Quality Control and Association Testing for multi-cohort and biobank-scale GWAS/PheWAS data"

收藏
DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100895
下载链接
链接失效反馈
官方服务:
资源简介:
Genome-wide association studies (GWAS) and phenome-wide association studies (PheWAS) involving one million GWAS samples from dozens of population-based biobanks present a considerable computational challenge and are carried out by large scientific groups under great expenditure of time and personnel. Automating these processes requires highly efficient and scalable methods and software, but so far there is no workflow solution to easily process one million GWAS samples. Here we present BIGwas, a portable, fully automated QC and association testing pipeline for large-scale binary and quantitative trait GWAS data provided by biobank resources. By using Nextflow workflow and Singularity software container technology, BIGwas performs resource-efficient and reproducible analyses on a local computer or any high performance compute system (HPC) with just one command, with no need to manually install a software execution environment or various software packages. For a single-command GWAS analysis with 974,818 individuals and 92 million genetic markers, BIGwas takes around 16 days on a small HPC system with only seven compute nodes to perform a complete GWAS QC and association analysis protocol. Our dynamic parallelization approach enables shorter runtimes for large HPCs. Researchers without extensive bioinformatics knowledge and with few computer resources can use BIGwas to perform multi-cohort GWAS with one million GWAS samples and, if desired, use it to build their own (genome-wide) PheWAS resource.

涉及来自数十个人群队列生物样本库的100万份GWAS样本的全基因组关联研究(GWAS)和全表型组关联研究(PheWAS)面临巨大的计算挑战,通常由大型科研团队投入大量时间和人力才能完成。自动化这些流程需要高效且可扩展的方法与软件,但迄今为止尚无能够轻松处理100万份GWAS样本的工作流解决方案。本文介绍BIGwas——一款针对生物样本库资源提供的大规模二元性状与数量性状GWAS数据的可移植、全自动质量控制(QC)及关联分析流程。通过采用Nextflow工作流与Singularity软件容器技术,BIGwas仅需一条命令即可在本地计算机或任意高性能计算系统(HPC)上执行资源高效且可重复的分析,无需手动安装软件执行环境或各类软件包。针对包含974,818名个体与9200万个遗传标记的单命令GWAS分析,BIGwas在仅配备7个计算节点的小型HPC系统上完成全套GWAS质量控制与关联分析流程约需16天。我们的动态并行化方法可缩短大型HPC系统的运行时间。即使是缺乏深厚生物信息学知识或计算机资源有限的研究人员,也可借助BIGwas开展包含100万份GWAS样本的多队列GWAS分析;若有需要,还可利用其构建自身的(全基因组范围)PheWAS资源。
提供机构:
GigaScience Database
创建时间:
2021-05-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作