Supplementary Data for 'Transformers Significantly Improve Splice Site Prediction'
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14109867
下载链接
链接失效反馈官方服务:
资源简介:
Description:
This repository contains supplementary data accompanying the manuscript "Transformers Significantly Improve Splice Site Prediction". The data includes annotations used for training our splice site prediction models and the predictions made by our model and SpliceAI 10k. These datasets are provided to facilitate replication of our results and to support further research in RNA splicing and machine learning applications in genomics.
Contents:
Annotations Used for Model Training:
clinvar_splice_variants.tsv
Description: Contains detailed information about ClinVar splice variants used in our study.
Contents Include: Variant identifiers, genomic coordinates, associated clinical significance, and relevant annotations.
splice_site_annotation_gtex.tsv
Description: Splice site annotations derived from all tissues in GTEx V8.
Contents Include: Coordinates of splice sites and transcript information.
splice_site_annotation_icelandic_whole_blood_plus_gtex.tsv
Description: Splice site annotations derived from a combination of Icelandic whole blood samples and samples from all tissues in GTEx V8.
Contents Include: Coordinates of splice sites and transcript information.
Model Predictions:
a. SpliceAI 10k Model Predictions:
spliceai_10k_clinvar_delta.vcf
Description: SpliceAI 10k delta scores for ClinVar splice variants.
Contents Include: Variant Call Format (VCF) file containing delta scores that indicate the predicted impact on splicing for ClinVar variants.
spliceai_10k_no_sqtl_delta.vcf
Description: SpliceAI 10k delta scores for variants unlikely to be splicing quantitative trait loci (sQTLs) in Icelandic whole blood.
Contents Include: VCF file with delta scores for variants not associated with sQTLs.
spliceai_10k_sqtl_delta.vcf
Description: SpliceAI 10k delta scores for sQTLs detected in Icelandic whole blood.
Contents Include: VCF file with delta scores for variants identified as sQTLs.
b. Transformer 45k Model Predictions:
transformer_45k_clinvar_delta.vcf
Description: Transformer 45k delta scores for ClinVar splice variants.
Contents Include: VCF file with delta scores from our Transformer model, indicating the predicted impact on splicing.
transformer_45k_no_sqtl_delta.vcf
Description: Transformer 45k delta scores for variants unlikely to be sQTLs in Icelandic whole blood.
Contents Include: VCF file with delta scores for variants not associated with sQTLs, as predicted by our model.
transformer_45k_sqtl_delta.vcf
Description: Transformer 45k delta scores for sQTLs detected in Icelandic whole blood.
Contents Include: VCF file with delta scores for variants identified as sQTLs, based on our Transformer model predictions.
Additional Information:
Delta Scores and Their Interpretation:
Each variant is assessed for its potential impact on splicing through four delta scores:
Acceptor Site Creation (top_a_creation_delta): Predicts the likelihood of creating a new acceptor site.
Acceptor Site Disruption (top_a_disruption_delta): Predicts the likelihood of disrupting an existing acceptor site.
Donor Site Creation (top_d_creation_delta): Predicts the likelihood of creating a new donor site.
Donor Site Disruption (top_d_disruption_delta): Predicts the likelihood of disrupting an existing donor site.
Final Delta Score Calculation:
The overall impact of a variant is determined by taking the maximum of these four delta scores:
final_delta_score = max(top_a_creation_delta, top_a_disruption_delta, top_d_creation_delta, top_d_disruption_delta)
A higher final delta score indicates a greater predicted impact on splicing.
Positions:
The positions (*_pos) indicate the genomic coordinates where the predicted splicing events occur, providing insight into the specific locations affected by the variant.
Interpreting Delta Scores:
Delta scores range from 0 to 1.
Scores closer to 1 suggest a higher probability of the variant affecting splicing.
Purpose:
These datasets support the findings reported in our manuscript by providing the raw data used for model training and evaluation.
Researchers can use these datasets to replicate our experiments, compare model performances, or conduct further studies on splice site prediction.
Citation:
Please cite this dataset as:
B.A. Jónsson, G.H. Halldórsson, S. Árdal, S. Rögnvaldsson, E. Einarsson, P. Sulem, D.F. Guðbjartsson, P. Melsted, K. Stefánsson, M.Ö. Úlfarsson (2023). Supplementary Data for "Transformers Significantly Improve Splice Site Prediction". Zenodo. https://doi.org/10.5281/zenodo.14109868
Contact Information:
For questions or further information, please contact:
Benedikt A. Jónsson
Affiliation: deCODE Genetics/Amgen, Inc., Reykjavik, Iceland
Email: benediktj@decode.is
创建时间:
2024-11-13



