introvoyz041/pgc-schizophrenia
收藏Hugging Face2026-04-09 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/introvoyz041/pgc-schizophrenia
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- tabular-regression
- tabular-classification
tags:
- gwas
- summary-statistics
- psychiatric-genomics
- pgc
- scz
- mental-health
- genetics
- genomics
- biology
- health
- bioinformatics
pretty_name: PGC Schizophrenia GWAS Summary Statistics
size_categories:
- 1M-10M
configs:
- config_name: scz2011
data_files:
- split: train
path: data/scz2011/*.parquet
default: true
- config_name: scz2013sweden
data_files:
- split: train
path: data/scz2013sweden/*.parquet
- config_name: scz2014
data_files:
- split: train
path: data/scz2014/*.parquet
- config_name: scz2018clozuk
data_files:
- split: train
path: data/scz2018clozuk/*.parquet
- config_name: scz2019asi
data_files:
- split: train
path: data/scz2019asi/*.parquet
- config_name: scz2022
data_files:
- split: train
path: data/scz2022/*.parquet
language:
- en
source_datasets:
- pgc
---
# PGC Schizophrenia — GWAS Summary Statistics
[](https://creativecommons.org/licenses/by/4.0/)
## Dataset Description
Genome-wide association study (GWAS) summary statistics for **Schizophrenia** phenotypes from the [Psychiatric Genomics Consortium (PGC)](https://pgc.unc.edu/).
This dataset contains multiple GWAS publications as separate subsets (configs). Each can be loaded independently.
## Usage
```python
from datasets import load_dataset
# Load a specific GWAS (e.g., scz2011)
ds = load_dataset("OpenMed/pgc-schizophrenia", "scz2011")
print(ds)
```
### Available Configs
```python
from datasets import get_dataset_config_names
configs = get_dataset_config_names("OpenMed/pgc-schizophrenia")
print(configs)
```
## Subsets (Publications)
| Config | Phenotype | Journal | Year | PubMed | Rows | License |
|--------|-----------|---------|------|--------|------|---------|
| `scz2011` | Schizophrenia | Nature Genetics | 2011 | [21926974](https://pubmed.ncbi.nlm.nih.gov/21926974/) | — | CC BY 4.0 |
| `scz2013sweden` | Schizophrenia (Swedish) | Nature Genetics | 2013 | [23974872](https://pubmed.ncbi.nlm.nih.gov/23974872/) | — | CC BY 4.0 |
| `scz2014` | Schizophrenia | Nature | 2014 | [25056061](https://pubmed.ncbi.nlm.nih.gov/25056061/) | 10,172,956 | CC BY 4.0 |
| `scz2018clozuk` | Schizophrenia (CLOZUK+PGC2) | Nature Genetics | 2018 | [29483656](https://pubmed.ncbi.nlm.nih.gov/29483656/) | — | CC BY 4.0 |
| `scz2019asi` | Schizophrenia (East Asian + European) | Nature Genetics | 2019 | [31740837](https://pubmed.ncbi.nlm.nih.gov/31740837/) | 24,253,547 | CC BY 4.0 |
| `scz2022` | Schizophrenia | Nature | 2022 | [35396580](https://pubmed.ncbi.nlm.nih.gov/35396580/) | 52,560,584 | CC BY 4.0 |
## Data Format
All data has been converted to **Apache Parquet** format with shards of 10,000 rows. Common columns include:
| Column | Description |
|--------|-------------|
| `SNP` / `ID` | SNP rsID or variant identifier |
| `CHR` | Chromosome |
| `BP` / `POS` | Base-pair position (typically GRCh37/hg19) |
| `A1` / `ALT` | Effect allele |
| `A2` / `REF` | Non-effect (reference) allele |
| `OR` / `BETA` | Odds ratio or effect size |
| `SE` | Standard error |
| `P` | P-value |
| `INFO` | Imputation quality score |
| `FRQ` / `MAF` | Allele frequency |
| `_source_file` | Original source filename |
> **Note:** Column names vary between publications. The `_source_file` column tracks the original file each row came from.
## Citation
When using any subset, please cite:
1. The **original publication** (see PubMed links above)
2. The **data DOI** from Figshare (see supplementary metadata)
3. **Acknowledge the PGC:**
> "Data were obtained from the Psychiatric Genomics Consortium — https://pgc.unc.edu/"
## Terms of Use
This dataset is released under the **[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)** license.
By using PGC summary statistics you agree to:
1. Cite the original publication(s)
2. Not attempt to re-identify individual participants
3. Comply with the PGC's [data use policies](https://pgc.unc.edu/for-researchers/data-access/)
## Source
- **Consortium:** [Psychiatric Genomics Consortium (PGC)](https://pgc.unc.edu/)
- **PGC Downloads:** [pgc.unc.edu/for-researchers/download-results/](https://pgc.unc.edu/for-researchers/download-results/)
---
*Last updated: April 2026*
许可证:CC BY 4.0
任务类别:
- 表格回归(tabular-regression)
- 表格分类(tabular-classification)
标签:
- 全基因组关联研究(Genome-wide association study, GWAS)
- 汇总统计量
- 精神基因组学
- 精神疾病基因组学联盟(Psychiatric Genomics Consortium, PGC)
- 精神分裂症(Schizophrenia, SCZ)
- 心理健康
- 遗传学
- 基因组学
- 生物学
- 健康
- 生物信息学
美观名称:PGC精神分裂症全基因组关联研究汇总统计量
规模类别:
- 100万-1000万
配置项:
- 配置名称:scz2011
数据文件:
- 拆分:训练集
路径:data/scz2011/*.parquet
默认启用:是
- 配置名称:scz2013sweden
数据文件:
- 拆分:训练集
路径:data/scz2013sweden/*.parquet
- 配置名称:scz2014
数据文件:
- 拆分:训练集
路径:data/scz2014/*.parquet
- 配置名称:scz2018clozuk
数据文件:
- 拆分:训练集
路径:data/scz2018clozuk/*.parquet
- 配置名称:scz2019asi
数据文件:
- 拆分:训练集
路径:data/scz2019asi/*.parquet
- 配置名称:scz2022
数据文件:
- 拆分:训练集
路径:data/scz2022/*.parquet
语言:英语
源数据集:
- PGC
# PGC精神分裂症——全基因组关联研究汇总统计量
[](https://creativecommons.org/licenses/by/4.0/)
## 数据集说明
本数据集包含来自[精神疾病基因组学联盟(PGC)](https://pgc.unc.edu/)的精神分裂症表型全基因组关联研究(GWAS)汇总统计量。
本数据集包含多个独立的全基因组关联研究出版物子集(即配置项),每个子集均可单独加载使用。
## 使用方法
python
from datasets import load_dataset
# 加载指定的全基因组关联研究子集(例如scz2011)
ds = load_dataset("OpenMed/pgc-schizophrenia", "scz2011")
print(ds)
### 可用配置项
python
from datasets import get_dataset_config_names
configs = get_dataset_config_names("OpenMed/pgc-schizophrenia")
print(configs)
## 子集(对应出版物)
| 配置名称 | 表型 | 期刊 | 发表年份 | PubMed编号 | 行数 | 许可证 |
|--------|-----------|---------|------|--------|------|---------|
| `scz2011` | 精神分裂症 | 《自然·遗传学》 | 2011 | [21926974](https://pubmed.ncbi.nlm.nih.gov/21926974/) | — | CC BY 4.0 |
| `scz2013sweden` | 精神分裂症(瑞典队列) | 《自然·遗传学》 | 2013 | [23974872](https://pubmed.ncbi.nlm.nih.gov/23974872/) | — | CC BY 4.0 |
| `scz2014` | 精神分裂症 | 《自然》 | 2014 | [25056061](https://pubmed.ncbi.nlm.nih.gov/25056061/) | 10,172,956 | CC BY 4.0 |
| `scz2018clozuk` | 精神分裂症(CLOZUK+PGC2队列) | 《自然·遗传学》 | 2018 | [29483656](https://pubmed.ncbi.nlm.nih.gov/29483656/) | — | CC BY 4.0 |
| `scz2019asi` | 精神分裂症(东亚+欧洲队列) | 《自然·遗传学》 | 2019 | [31740837](https://pubmed.ncbi.nlm.nih.gov/31740837/) | 24,253,547 | CC BY 4.0 |
| `scz2022` | 精神分裂症 | 《自然》 | 2022 | [35396580](https://pubmed.ncbi.nlm.nih.gov/35396580/) | 52,560,584 | CC BY 4.0 |
## 数据格式
所有数据已转换为**Apache Parquet格式**,并拆分为每份含10000行的分片。通用列包括:
| 列名 | 说明 |
|--------|-------------|
| `SNP` / `ID` | SNP的rs编号或变异标识符 |
| `CHR` | 染色体编号 |
| `BP` / `POS` | 碱基对位置(通常采用GRCh37/hg19参考基因组版本) |
| `A1` / `ALT` | 效应等位基因 |
| `A2` / `REF` | 非效应(参考)等位基因 |
| `OR` / `BETA` | 比值比或效应量 |
| `SE` | 标准误 |
| `P` | P值 |
| `INFO` | 基因型填充质量得分 |
| `FRQ` / `MAF` | 等位基因频率 |
| `_source_file` | 原始来源文件名 |
> **注意:** 不同出版物对应的列名可能存在差异。`_source_file`列用于追踪每一行数据的原始来源文件。
## 引用规范
使用本数据集任意子集时,请引用以下内容:
1. 对应**原始出版物**(详见上文PubMed链接)
2. Figshare平台上的**数据集DOI**(详见补充元数据)
3. **致谢精神疾病基因组学联盟(PGC)**:
> "本研究使用的数据来自精神疾病基因组学联盟——https://pgc.unc.edu/"
## 使用条款
本数据集采用**[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)**许可证发布。
使用PGC汇总统计量数据集即表示您同意以下条款:
1. 引用相关原始出版物
2. 不得尝试重新识别个体参与者
3. 遵守PGC的[数据使用政策](https://pgc.unc.edu/for-researchers/data-access/)
## 数据来源
- **联盟方:** [精神疾病基因组学联盟(PGC)](https://pgc.unc.edu/)
- **PGC数据下载页:** [pgc.unc.edu/for-researchers/download-results/](https://pgc.unc.edu/for-researchers/download-results/)
---
*最后更新时间:2026年4月*
提供机构:
introvoyz041



