ClinVar Variant Classification Results
收藏DataCite Commons2025-06-04 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/ClinVar_Variant_Classification_Results/29224097/1
下载链接
链接失效反馈官方服务:
资源简介:
This CSV file contains predictions from a ClinVarBERT model that classifies genetic variant submissions into three pathogenicity categories: Pathogenic/Likely Pathogenic (P/LP), Variant of Uncertain Significance (VUS), and Benign/Likely Benign (B/LB). The model makes predictions of the variant based on the ClinVar submission summary. <br><br>The output CSV contains the following columns: <b>Identifier Columns </b><br>SCV: Submission accession number from ClinVar (format: SCV000000000) <br>VCV: Variation accession number from ClinVar (format: VCV000000000) <br>RCV: Record accession number from ClinVar (format: RCV000000000) <br>VariationID: Numerical identifier for the genetic variation <br><br><b>Genomic Coordinates </b><br>GRCh38_Chr: Chromosome number (1-22, X, Y, MT) <br>GRCh38_Start: Start position on chromosome (GRCh38/hg38 assembly) <br>GRCh38_Stop: Stop position on chromosome (GRCh38/hg38 assembly) <br>GRCh38_ReferenceAllele: Reference allele sequence <br>GRCh38_AlternateAllele: Alternate allele sequence <br><br><b>Protein-Level Information </b><br>aapos: Amino acid position in the protein <br>aaref: Reference amino acid (single letter code) <br>aaalt: Alternate amino acid (single letter code) <br>gene: Gene symbol (e.g., BRCA1, TP53) <br><br><b>Original Classification </b><br>SubmissionClassification: Original classification provided by the submitter Values: "pathogenic", "likely pathogenic", "uncertain significance", "benign", "likely benign" <br><br><b>Input Text </b><br>Comment: The textual comment/description provided with the variant submission that was used as input to the model <br><br><b>Model Predictions </b><br>prob_P_LP: Probability score for Pathogenic/Likely Pathogenic classification (0.0 to 1.0) <br>prob_VUS: Probability score for Variant of Uncertain Significance classification (0.0 to 1.0) <br>prob_B_LB: Probability score for Benign/Likely Benign classification (0.0 to 1.0) <br>predicted_label: Final predicted classification based on highest probability Values: "P/LP", "VUS", "B/LB" <br>Notes on Probability Scores All three probability scores (prob_P_LP, prob_VUS, prob_B_LB) sum to 1.0 for each row. Higher probability indicates greater model confidence for that classification. The predicted_label corresponds to the classification with the highest probability score. <br>Model Type: Fine-tuned transformer model for sequence classification <br>Input: Textual comments from variant submissions <br>Output: Three-class classification (P/LP, VUS, B/LB) <br>Training Data: ClinVar variant submissions with known classifications
提供机构:
figshare
创建时间:
2025-06-03



