victormaricato/glof
收藏Hugging Face2026-04-13 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/victormaricato/glof
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- tabular-classification
language:
- en
tags:
- biology
- genomics
- clinical-genetics
- missense-variants
- loss-of-function
- gain-of-function
- variant-effect-prediction
- benchmark
- precision-medicine
size_categories:
- 100K<n<1M
---
# GLOF: A large-scale expert-curated benchmark of missense variant functional effects
GLOF (Gain and Loss Of Function) is a benchmark dataset of **112,399 missense variants** across **2,809 human genes**, each classified as **LOF** (loss-of-function), **GOF** (gain-of-function), or **Neutral** by board-certified clinical geneticists.
## Dataset Description
The dataset was curated at [Mendelics Analise Genomica](https://www.mendelics.com/), one of Latin America's largest clinical genomics laboratories. The annotation process integrated ClinVar pathogenicity classifications, published functional studies, established gene-disease relationships, and expert clinical judgment following ACMG/AMP guidelines.
- **Pathogenic variants** were sourced from ClinVar (July 2023 release) and cross-referenced against the March 2026 release; variants with reclassified or conflicting evidence were excluded.
- **Neutral variants** were drawn from gnomAD v3.1 and validated against v4.1 allele frequencies, selecting missense variants with AF > 1%.
## Dataset Schema
| Field | Type | Description |
|-------|------|-------------|
| VARIANTKEY | String | Unique variant identifier: `chr-position-ref-alt` (GRCh38) |
| LABEL | String | Functional classification: `Neutral`, `LOF`, or `GOF` |
| ENSG | String | Ensembl gene identifier |
| GENE_SYMBOL | String | HGNC gene symbol |
| AA_POSITION | Integer | Amino acid substitution position in the canonical protein |
| PROTEIN_REF | Character | Reference (wild-type) amino acid |
| PROTEIN_ALT | Character | Alternate (mutant) amino acid |
## Class Distribution
| Class | Variants | Percentage | Genes |
|-------|----------|------------|-------|
| Neutral | 83,902 | 74.6% | 2,749 |
| LOF | 25,368 | 22.6% | 2,020 |
| GOF | 3,129 | 2.8% | 260 |
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("victormaricato/glof")
```
## Citation
If you use this dataset, please cite:
```bibtex
@article{maricato2026glof,
title={GLOF: A large-scale expert-curated benchmark dataset of gain-of-function and loss-of-function missense variants},
author={Maricato, Victor and Schlesinger, David and de Souza Moura, Pedro Nuno},
year={2026}
}
```
## License
This dataset is released under the [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license.
提供机构:
victormaricato



