five

GnomXMiss: Missense Variant Annotations Derived From AlphaMissense Enriched With gnomAD, VEP, and ClinVar

收藏
Zenodo2025-12-16 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.17879697
下载链接
链接失效反馈
官方服务:
资源简介:
GnomXMiss per-protein files <uniprot_id>_GnomXMiss.tsv Each file contains available missense variants for a UniProt canonical protein present in the AlphaMissense data dump that could be crossed referenced with: gnomAD allele frequencies (overall + subpopulation-specific) AlphaMissense pathogenicity scores VEP annotations (GENCODE v46) ClinVar variant metadata Only proteins for which all required annotations were successfully retrieved are included. These files are sorted by residue_number and contain variants from the AlphaMissense hg38 missense dump together with variants in the matching genomic intervals of gnomad.joint.v4.1.sites.ht; we outer-merge both sources on (#CHROM, POS, REF, ALT) and annotate variants that lack information where possible. Column descriptions Below is the full set of columns that appear across the *_GnomXMiss.tsv files Primary identifiers uniprot_id Canonical UniProtKB accession for the protein associated with the variant. #CHROM Chromosome name (e.g., chr1, chr2, chrX, chrY) based on GRCh38. POS 1-based genomic coordinate of the variant (GRCh38). REF, ALT Reference and alternate nucleotides from gnomAD. Protein-level variant information protein_variant Amino-acid substitution in the format <RefAA><Protein_Sequence_Position><AltAA>, e.g. V600E. residue_number Integer position of the substitution extracted from protein_variant. AlphaMissense annotation am_class Classification of variant pathogenicity: likely_benign ambiguous likely_pathogenic am_pathogenicity Calibrated AlphaMissense pathogenicity score (0–1). ClinVar annotations (if available) existing_variation ClinVar or dbSNP IDs extracted via Ensembl VEP. clinvar_significance e.g., Benign, Likely_pathogenic, VUS, etc. clinvar_condition Name(s) of associated disease conditions. clinvar_disease_id Disease database identifiers (MEDGEN, MONDO, OMIM, etc.). clinvar_variant_id ClinVar Variation ID. gnomAD allele frequency data (v4) (if available) These fields are derived from the joint gnomAD v4 release and represent population-level statistics (no individual data). Overall allele_count allele_number allele_frequency homozygote_count Population-specific allele frequencies The dataset includes AF values for multiple ancestry groups: nfe_allele_frequency — Non-Finnish European fin_allele_frequency — Finnish afr_allele_frequency — African/African American eas_allele_frequency — East Asian sas_allele_frequency — South Asian amr_allele_frequency — Latino/Admixed American ami_allele_frequency — Amish asj_allele_frequency — Ashkenazi Jewish mid_allele_frequency — Middle Eastern remaining_allele_frequency — Remaining ancestry group Sex-specific allele frequencies XX_allele_frequency XY_allele_frequency (Some fields be empty depending on variant representation in gnomAD.)
提供机构:
Zenodo
创建时间:
2025-12-10
二维码
社区交流群
二维码
科研交流群
商业服务