introvoyz041/pgc-cross-disorder
收藏Hugging Face2026-04-09 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/introvoyz041/pgc-cross-disorder
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- tabular-regression
- tabular-classification
tags:
- gwas
- summary-statistics
- psychiatric-genomics
- pgc
- cdg
- mental-health
- genetics
- genomics
- biology
- health
- bioinformatics
pretty_name: PGC Cross-Disorder GWAS Summary Statistics
size_categories:
- 1M-10M
configs:
- config_name: cdg2013
default: true
data_files:
- split: train
path: data/cdg2013/*.parquet
- config_name: cdg2018-bip-scz
data_files:
- split: train
path: data/cdg2018-bip-scz/*.parquet
- config_name: cdg2019
data_files:
- split: train
path: data/cdg2019/*.parquet
- config_name: cdg2020-bip-mdd
data_files:
- split: train
path: data/cdg2020-bip-mdd/*.parquet
- config_name: cdg2025
data_files:
- split: train
path: data/cdg2025/*.parquet
language:
- en
source_datasets:
- pgc
---
# PGC Cross-Disorder — GWAS Summary Statistics
[](https://creativecommons.org/licenses/by/4.0/)
## Dataset Description
Genome-wide association study (GWAS) summary statistics for **Cross-Disorder** phenotypes from the [Psychiatric Genomics Consortium (PGC)](https://pgc.unc.edu/).
Each publication is available as a separate subset (config) and can be loaded independently.
## Usage
```python
from datasets import load_dataset
# Load a specific GWAS
ds = load_dataset("OpenMed/pgc-cross-disorder", "cdg2013")
print(ds)
```
### List all available subsets
```python
from datasets import get_dataset_config_names
print(get_dataset_config_names("OpenMed/pgc-cross-disorder"))
```
## Subsets
| Config | Phenotype | Journal | Year | PubMed | Rows |
|--------|-----------|---------|------|--------|------|
| `cdg2013` | Multiple Psychiatric Disorders | The Lancet | 2013 | [23453885](https://pubmed.ncbi.nlm.nih.gov/23453885/) | — |
| `cdg2018-bip-scz` | Bipolar Disorder & Schizophrenia | Cell | 2018 | [29906448](https://pubmed.ncbi.nlm.nih.gov/29906448/) | — |
| `cdg2019` | Multiple Psychiatric Disorders | Cell | 2019 | [31835028](https://pubmed.ncbi.nlm.nih.gov/31835028/) | — |
| `cdg2020-bip-mdd` | Bipolar Disorder & Major Depression | Biological Psychiatry | 2020 | [31926635](https://pubmed.ncbi.nlm.nih.gov/31926635/) | — |
| `cdg2025` | Multiple Psychiatric Disorders | Nature | 2025 | Pending | — |
## Data Format
All data is stored as **Apache Parquet** shards (10,000 rows each). Common columns:
| Column | Description |
|--------|-------------|
| `SNP` / `ID` | SNP rsID or variant identifier |
| `CHR` | Chromosome |
| `BP` / `POS` | Base-pair position (typically GRCh37/hg19) |
| `A1` | Effect allele |
| `A2` | Non-effect allele |
| `OR` / `BETA` | Odds ratio or effect size |
| `SE` | Standard error |
| `P` | P-value |
| `_source_file` | Original source filename |
> Column names vary between publications. Check each subset's schema.
## Citation
Please cite the original publication (see PubMed links above) and acknowledge the PGC:
> Data were obtained from the Psychiatric Genomics Consortium — https://pgc.unc.edu/
## Terms of Use
Released under **[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)**.
- Cite the original publication(s)
- Do not attempt to re-identify individual participants
- Comply with the PGC [data use policies](https://pgc.unc.edu/for-researchers/data-access/)
## Source
- [Psychiatric Genomics Consortium (PGC)](https://pgc.unc.edu/)
- [PGC Downloads](https://pgc.unc.edu/for-researchers/download-results/)
---
*Last updated: April 2026*
提供机构:
introvoyz041



