GnomXMiss: Missense Variant Annotations Derived From AlphaMissense Enriched With gnomAD, VEP, and ClinVar
收藏Zenodo2025-12-16 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.17879697
下载链接
链接失效反馈官方服务:
资源简介:
GnomXMiss per-protein files
<uniprot_id>_GnomXMiss.tsv
Each file contains available missense variants for a UniProt canonical protein present in the AlphaMissense data dump that could be crossed referenced with:
gnomAD allele frequencies (overall + subpopulation-specific)
AlphaMissense pathogenicity scores
VEP annotations (GENCODE v46)
ClinVar variant metadata
Only proteins for which all required annotations were successfully retrieved are included.
These files are sorted by residue_number and contain variants from the AlphaMissense hg38 missense dump together with variants in the matching genomic intervals of gnomad.joint.v4.1.sites.ht; we outer-merge both sources on (#CHROM, POS, REF, ALT) and annotate variants that lack information where possible.
Column descriptions
Below is the full set of columns that appear across the *_GnomXMiss.tsv files
Primary identifiers
uniprot_id
Canonical UniProtKB accession for the protein associated with the variant.
#CHROM
Chromosome name (e.g., chr1, chr2, chrX, chrY) based on GRCh38.
POS
1-based genomic coordinate of the variant (GRCh38).
REF, ALT
Reference and alternate nucleotides from gnomAD.
Protein-level variant information
protein_variant
Amino-acid substitution in the format <RefAA><Protein_Sequence_Position><AltAA>, e.g. V600E.
residue_number
Integer position of the substitution extracted from protein_variant.
AlphaMissense annotation
am_class
Classification of variant pathogenicity:
likely_benign
ambiguous
likely_pathogenic
am_pathogenicity
Calibrated AlphaMissense pathogenicity score (0–1).
ClinVar annotations (if available)
existing_variation
ClinVar or dbSNP IDs extracted via Ensembl VEP.
clinvar_significance
e.g., Benign, Likely_pathogenic, VUS, etc.
clinvar_condition
Name(s) of associated disease conditions.
clinvar_disease_id
Disease database identifiers (MEDGEN, MONDO, OMIM, etc.).
clinvar_variant_id
ClinVar Variation ID.
gnomAD allele frequency data (v4) (if available)
These fields are derived from the joint gnomAD v4 release and represent population-level statistics (no individual data).
Overall
allele_count
allele_number
allele_frequency
homozygote_count
Population-specific allele frequencies
The dataset includes AF values for multiple ancestry groups:
nfe_allele_frequency — Non-Finnish European
fin_allele_frequency — Finnish
afr_allele_frequency — African/African American
eas_allele_frequency — East Asian
sas_allele_frequency — South Asian
amr_allele_frequency — Latino/Admixed American
ami_allele_frequency — Amish
asj_allele_frequency — Ashkenazi Jewish
mid_allele_frequency — Middle Eastern
remaining_allele_frequency — Remaining ancestry group
Sex-specific allele frequencies
XX_allele_frequency
XY_allele_frequency
(Some fields be empty depending on variant representation in gnomAD.)
提供机构:
Zenodo
创建时间:
2025-12-10



