five

introvoyz041/pgc-schizophrenia

收藏
Hugging Face2026-04-09 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/introvoyz041/pgc-schizophrenia
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - tabular-regression - tabular-classification tags: - gwas - summary-statistics - psychiatric-genomics - pgc - scz - mental-health - genetics - genomics - biology - health - bioinformatics pretty_name: PGC Schizophrenia GWAS Summary Statistics size_categories: - 1M-10M configs: - config_name: scz2011 data_files: - split: train path: data/scz2011/*.parquet default: true - config_name: scz2013sweden data_files: - split: train path: data/scz2013sweden/*.parquet - config_name: scz2014 data_files: - split: train path: data/scz2014/*.parquet - config_name: scz2018clozuk data_files: - split: train path: data/scz2018clozuk/*.parquet - config_name: scz2019asi data_files: - split: train path: data/scz2019asi/*.parquet - config_name: scz2022 data_files: - split: train path: data/scz2022/*.parquet language: - en source_datasets: - pgc --- # PGC Schizophrenia — GWAS Summary Statistics [![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey)](https://creativecommons.org/licenses/by/4.0/) ## Dataset Description Genome-wide association study (GWAS) summary statistics for **Schizophrenia** phenotypes from the [Psychiatric Genomics Consortium (PGC)](https://pgc.unc.edu/). This dataset contains multiple GWAS publications as separate subsets (configs). Each can be loaded independently. ## Usage ```python from datasets import load_dataset # Load a specific GWAS (e.g., scz2011) ds = load_dataset("OpenMed/pgc-schizophrenia", "scz2011") print(ds) ``` ### Available Configs ```python from datasets import get_dataset_config_names configs = get_dataset_config_names("OpenMed/pgc-schizophrenia") print(configs) ``` ## Subsets (Publications) | Config | Phenotype | Journal | Year | PubMed | Rows | License | |--------|-----------|---------|------|--------|------|---------| | `scz2011` | Schizophrenia | Nature Genetics | 2011 | [21926974](https://pubmed.ncbi.nlm.nih.gov/21926974/) | — | CC BY 4.0 | | `scz2013sweden` | Schizophrenia (Swedish) | Nature Genetics | 2013 | [23974872](https://pubmed.ncbi.nlm.nih.gov/23974872/) | — | CC BY 4.0 | | `scz2014` | Schizophrenia | Nature | 2014 | [25056061](https://pubmed.ncbi.nlm.nih.gov/25056061/) | 10,172,956 | CC BY 4.0 | | `scz2018clozuk` | Schizophrenia (CLOZUK+PGC2) | Nature Genetics | 2018 | [29483656](https://pubmed.ncbi.nlm.nih.gov/29483656/) | — | CC BY 4.0 | | `scz2019asi` | Schizophrenia (East Asian + European) | Nature Genetics | 2019 | [31740837](https://pubmed.ncbi.nlm.nih.gov/31740837/) | 24,253,547 | CC BY 4.0 | | `scz2022` | Schizophrenia | Nature | 2022 | [35396580](https://pubmed.ncbi.nlm.nih.gov/35396580/) | 52,560,584 | CC BY 4.0 | ## Data Format All data has been converted to **Apache Parquet** format with shards of 10,000 rows. Common columns include: | Column | Description | |--------|-------------| | `SNP` / `ID` | SNP rsID or variant identifier | | `CHR` | Chromosome | | `BP` / `POS` | Base-pair position (typically GRCh37/hg19) | | `A1` / `ALT` | Effect allele | | `A2` / `REF` | Non-effect (reference) allele | | `OR` / `BETA` | Odds ratio or effect size | | `SE` | Standard error | | `P` | P-value | | `INFO` | Imputation quality score | | `FRQ` / `MAF` | Allele frequency | | `_source_file` | Original source filename | > **Note:** Column names vary between publications. The `_source_file` column tracks the original file each row came from. ## Citation When using any subset, please cite: 1. The **original publication** (see PubMed links above) 2. The **data DOI** from Figshare (see supplementary metadata) 3. **Acknowledge the PGC:** > "Data were obtained from the Psychiatric Genomics Consortium — https://pgc.unc.edu/" ## Terms of Use This dataset is released under the **[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)** license. By using PGC summary statistics you agree to: 1. Cite the original publication(s) 2. Not attempt to re-identify individual participants 3. Comply with the PGC's [data use policies](https://pgc.unc.edu/for-researchers/data-access/) ## Source - **Consortium:** [Psychiatric Genomics Consortium (PGC)](https://pgc.unc.edu/) - **PGC Downloads:** [pgc.unc.edu/for-researchers/download-results/](https://pgc.unc.edu/for-researchers/download-results/) --- *Last updated: April 2026*

许可证:CC BY 4.0 任务类别: - 表格回归(tabular-regression) - 表格分类(tabular-classification) 标签: - 全基因组关联研究(Genome-wide association study, GWAS) - 汇总统计量 - 精神基因组学 - 精神疾病基因组学联盟(Psychiatric Genomics Consortium, PGC) - 精神分裂症(Schizophrenia, SCZ) - 心理健康 - 遗传学 - 基因组学 - 生物学 - 健康 - 生物信息学 美观名称:PGC精神分裂症全基因组关联研究汇总统计量 规模类别: - 100万-1000万 配置项: - 配置名称:scz2011 数据文件: - 拆分:训练集 路径:data/scz2011/*.parquet 默认启用:是 - 配置名称:scz2013sweden 数据文件: - 拆分:训练集 路径:data/scz2013sweden/*.parquet - 配置名称:scz2014 数据文件: - 拆分:训练集 路径:data/scz2014/*.parquet - 配置名称:scz2018clozuk 数据文件: - 拆分:训练集 路径:data/scz2018clozuk/*.parquet - 配置名称:scz2019asi 数据文件: - 拆分:训练集 路径:data/scz2019asi/*.parquet - 配置名称:scz2022 数据文件: - 拆分:训练集 路径:data/scz2022/*.parquet 语言:英语 源数据集: - PGC # PGC精神分裂症——全基因组关联研究汇总统计量 [![许可证: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey)](https://creativecommons.org/licenses/by/4.0/) ## 数据集说明 本数据集包含来自[精神疾病基因组学联盟(PGC)](https://pgc.unc.edu/)的精神分裂症表型全基因组关联研究(GWAS)汇总统计量。 本数据集包含多个独立的全基因组关联研究出版物子集(即配置项),每个子集均可单独加载使用。 ## 使用方法 python from datasets import load_dataset # 加载指定的全基因组关联研究子集(例如scz2011) ds = load_dataset("OpenMed/pgc-schizophrenia", "scz2011") print(ds) ### 可用配置项 python from datasets import get_dataset_config_names configs = get_dataset_config_names("OpenMed/pgc-schizophrenia") print(configs) ## 子集(对应出版物) | 配置名称 | 表型 | 期刊 | 发表年份 | PubMed编号 | 行数 | 许可证 | |--------|-----------|---------|------|--------|------|---------| | `scz2011` | 精神分裂症 | 《自然·遗传学》 | 2011 | [21926974](https://pubmed.ncbi.nlm.nih.gov/21926974/) | — | CC BY 4.0 | | `scz2013sweden` | 精神分裂症(瑞典队列) | 《自然·遗传学》 | 2013 | [23974872](https://pubmed.ncbi.nlm.nih.gov/23974872/) | — | CC BY 4.0 | | `scz2014` | 精神分裂症 | 《自然》 | 2014 | [25056061](https://pubmed.ncbi.nlm.nih.gov/25056061/) | 10,172,956 | CC BY 4.0 | | `scz2018clozuk` | 精神分裂症(CLOZUK+PGC2队列) | 《自然·遗传学》 | 2018 | [29483656](https://pubmed.ncbi.nlm.nih.gov/29483656/) | — | CC BY 4.0 | | `scz2019asi` | 精神分裂症(东亚+欧洲队列) | 《自然·遗传学》 | 2019 | [31740837](https://pubmed.ncbi.nlm.nih.gov/31740837/) | 24,253,547 | CC BY 4.0 | | `scz2022` | 精神分裂症 | 《自然》 | 2022 | [35396580](https://pubmed.ncbi.nlm.nih.gov/35396580/) | 52,560,584 | CC BY 4.0 | ## 数据格式 所有数据已转换为**Apache Parquet格式**,并拆分为每份含10000行的分片。通用列包括: | 列名 | 说明 | |--------|-------------| | `SNP` / `ID` | SNP的rs编号或变异标识符 | | `CHR` | 染色体编号 | | `BP` / `POS` | 碱基对位置(通常采用GRCh37/hg19参考基因组版本) | | `A1` / `ALT` | 效应等位基因 | | `A2` / `REF` | 非效应(参考)等位基因 | | `OR` / `BETA` | 比值比或效应量 | | `SE` | 标准误 | | `P` | P值 | | `INFO` | 基因型填充质量得分 | | `FRQ` / `MAF` | 等位基因频率 | | `_source_file` | 原始来源文件名 | > **注意:** 不同出版物对应的列名可能存在差异。`_source_file`列用于追踪每一行数据的原始来源文件。 ## 引用规范 使用本数据集任意子集时,请引用以下内容: 1. 对应**原始出版物**(详见上文PubMed链接) 2. Figshare平台上的**数据集DOI**(详见补充元数据) 3. **致谢精神疾病基因组学联盟(PGC)**: > "本研究使用的数据来自精神疾病基因组学联盟——https://pgc.unc.edu/" ## 使用条款 本数据集采用**[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)**许可证发布。 使用PGC汇总统计量数据集即表示您同意以下条款: 1. 引用相关原始出版物 2. 不得尝试重新识别个体参与者 3. 遵守PGC的[数据使用政策](https://pgc.unc.edu/for-researchers/data-access/) ## 数据来源 - **联盟方:** [精神疾病基因组学联盟(PGC)](https://pgc.unc.edu/) - **PGC数据下载页:** [pgc.unc.edu/for-researchers/download-results/](https://pgc.unc.edu/for-researchers/download-results/) --- *最后更新时间:2026年4月*
提供机构:
introvoyz041
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作