ClinVar-BERT + Evidence Extraction Models Prediction Results on ClinVar Data
收藏DataCite Commons2025-12-11 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/ClinVar-BERT_Evidence_Extraction_Models_Prediction_Results_on_ClinVar_Data/30865748
下载链接
链接失效反馈官方服务:
资源简介:
ClinVar Submissions with ACMG Evidence Predictions Dataset<br><br><b>OVERVIEW</b><br><br>This dataset contains all ClinVar submissions that include a submission summary (Comment field), along with machine learning model predictions for three types of ACMG/AMP evidence (computational, functional, and population) and overall pathogenicity classification predictions from ClinVar-BERT.<br><br>The dataset was developed as part of research on scaling variant reclassification by extracting ACMG evidence from clinical summaries. It enables researchers to identify variants where specific evidence types support pathogenic or benign classifications, facilitating systematic variant re-evaluation and prioritization.<br><br>Source: ClinVar database (filtered for submissions containing submission summaries)<br><b>COLUMN DESCRIPTIONS</b><b>Variant Identifiers</b>- SCV: ClinVar Submission accession (unique identifier for each submission)- VCV: ClinVar Variation accession (groups submissions for the same variant)- RCV: ClinVar Reference accession (links variant to condition)- VariationID: ClinVar numeric variation identifier<br><b>Variant Annotations</b>- HGVS_VariantDescription: HGVS nomenclature description of the variant- GRCh38_Chr: Chromosome (GRCh38 reference genome)- GRCh38_Start: Genomic start position (GRCh38)- GRCh38_Stop: Genomic stop position (GRCh38)- GRCh38_ReferenceAllele: Reference allele- GRCh38_AlternateAllele: Alternate allele- Gene: Gene symbol (original ClinVar annotation)- gene: Gene symbol (standardized)- aapos: Amino acid position- aaref: Reference amino acid- aaalt: Alternate amino acid<br><b>ClinVar Classifications</b>- VariantClassification: Aggregated ClinVar classification for the variant- SubmissionClassification: Classification provided in this specific submission- Submitter: Name of the submitting organization- Comment: Submission summary text containing clinical interpretation and evidence- ClinicalSignificance: Clinical significance classification- clinvar_classification: Processed ClinVar classification- classification_category: Simplified classification category- has_conflicting_submissions: Boolean indicating whether the variant has conflicting interpretations across submissions<br><b>Computational Evidence Predictions (PP3/BP4)</b>- computational_final_label: Final predicted label (PP3, BP4, or No Evidence)- computational_has_evidence: Boolean indicating whether computational evidence was detected- computational_evidence_confidence: Confidence score for evidence detection (Stage 1)- computational_predicted_evidence: Raw predicted evidence category- computational_P_Score: Probability score for pathogenic computational evidence (PP3)- computational_B_Score: Probability score for benign computational evidence (BP4)<br><b>Functional Evidence Predictions (PS3/BS3)</b>- functional_final_label: Final predicted label (PS3, BS3, or No Evidence)- functional_has_evidence: Boolean indicating whether functional evidence was detected- functional_evidence_confidence: Confidence score for evidence detection (Stage 1)- functional_predicted_evidence: Raw predicted evidence category- PS3_scores: Detailed PS3 prediction scores- BS3_scores: Detailed BS3 prediction scores- PS3_Score: Probability score for pathogenic functional evidence (PS3)- BS3_Score: Probability score for benign functional evidence (BS3)<br><b>Population Evidence Predictions (BA1/BS1/PM2/PS4)</b>- population_final_label: Final predicted label (specific population code or No Evidence)- population_has_evidence: Boolean indicating whether population evidence was detected- population_evidence_confidence: Confidence score for evidence detection (Stage 1)- population_predicted_evidence: Raw predicted evidence category- population_P_Score: Probability score for pathogenic population evidence- population_B_Score: Probability score for benign population evidence<br><b>ClinVar-BERT Pathogenicity Predictions</b>- prob_B_LB: Predicted probability of Benign/Likely Benign classification- prob_VUS: Predicted probability of Variant of Uncertain Significance classification- prob_P_LP: Predicted probability of Pathogenic/Likely Pathogenic classification- predicted_label: Final predicted classification category (B/LB, VUS, or P/LP)<b>METHODS</b><b>Evidence Classification Framework:</b>The evidence predictions were generated using a two-stage BioBERT-based classification framework. Stage 1 (Evidence Detection) determines whether a specific type of evidence (computational, functional, or population) is present in the submission summary. Stage 2 (Evidence Direction) classifies whether detected evidence supports a pathogenic or benign interpretation.<b>ClinVar-BERT:</b>ClinVar-BERT is a BERT-based classifier that predicts overall variant pathogenicity (P/LP, VUS, or B/LB) based on the submission summary text.<b>USAGE NOTES</b>1. Filtering by Evidence Type: Use the *_has_evidence columns to filter for submissions containing specific evidence types.2. Confidence Thresholds: The *_evidence_confidence and probability scores can be used to filter predictions at different confidence levels.3. Reclassification Candidates: Variants classified as VUS but with evidence predictions suggesting P/LP or B/LB may be candidates for reclassification review.4. Conflicting Interpretations: The has_conflicting_submissions flag identifies variants with discordant classifications across submitters.
提供机构:
figshare
创建时间:
2025-12-11



