five

victormaricato/glof

收藏
Hugging Face2026-04-13 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/victormaricato/glof
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - tabular-classification language: - en tags: - biology - genomics - clinical-genetics - missense-variants - loss-of-function - gain-of-function - variant-effect-prediction - benchmark - precision-medicine size_categories: - 100K<n<1M --- # GLOF: A large-scale expert-curated benchmark of missense variant functional effects GLOF (Gain and Loss Of Function) is a benchmark dataset of **112,399 missense variants** across **2,809 human genes**, each classified as **LOF** (loss-of-function), **GOF** (gain-of-function), or **Neutral** by board-certified clinical geneticists. ## Dataset Description The dataset was curated at [Mendelics Analise Genomica](https://www.mendelics.com/), one of Latin America's largest clinical genomics laboratories. The annotation process integrated ClinVar pathogenicity classifications, published functional studies, established gene-disease relationships, and expert clinical judgment following ACMG/AMP guidelines. - **Pathogenic variants** were sourced from ClinVar (July 2023 release) and cross-referenced against the March 2026 release; variants with reclassified or conflicting evidence were excluded. - **Neutral variants** were drawn from gnomAD v3.1 and validated against v4.1 allele frequencies, selecting missense variants with AF > 1%. ## Dataset Schema | Field | Type | Description | |-------|------|-------------| | VARIANTKEY | String | Unique variant identifier: `chr-position-ref-alt` (GRCh38) | | LABEL | String | Functional classification: `Neutral`, `LOF`, or `GOF` | | ENSG | String | Ensembl gene identifier | | GENE_SYMBOL | String | HGNC gene symbol | | AA_POSITION | Integer | Amino acid substitution position in the canonical protein | | PROTEIN_REF | Character | Reference (wild-type) amino acid | | PROTEIN_ALT | Character | Alternate (mutant) amino acid | ## Class Distribution | Class | Variants | Percentage | Genes | |-------|----------|------------|-------| | Neutral | 83,902 | 74.6% | 2,749 | | LOF | 25,368 | 22.6% | 2,020 | | GOF | 3,129 | 2.8% | 260 | ## Usage ```python from datasets import load_dataset dataset = load_dataset("victormaricato/glof") ``` ## Citation If you use this dataset, please cite: ```bibtex @article{maricato2026glof, title={GLOF: A large-scale expert-curated benchmark dataset of gain-of-function and loss-of-function missense variants}, author={Maricato, Victor and Schlesinger, David and de Souza Moura, Pedro Nuno}, year={2026} } ``` ## License This dataset is released under the [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license.
提供机构:
victormaricato
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作