five

Supplementary Data for 'Transformers Significantly Improve Splice Site Prediction'

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14109867
下载链接
链接失效反馈
官方服务:
资源简介:
Description: This repository contains supplementary data accompanying the manuscript "Transformers Significantly Improve Splice Site Prediction". The data includes annotations used for training our splice site prediction models and the predictions made by our model and SpliceAI 10k. These datasets are provided to facilitate replication of our results and to support further research in RNA splicing and machine learning applications in genomics. Contents: Annotations Used for Model Training: clinvar_splice_variants.tsv Description: Contains detailed information about ClinVar splice variants used in our study. Contents Include: Variant identifiers, genomic coordinates, associated clinical significance, and relevant annotations. splice_site_annotation_gtex.tsv Description: Splice site annotations derived from all tissues in GTEx V8. Contents Include: Coordinates of splice sites and transcript information. splice_site_annotation_icelandic_whole_blood_plus_gtex.tsv Description: Splice site annotations derived from a combination of Icelandic whole blood samples and samples from all tissues in GTEx V8. Contents Include: Coordinates of splice sites and transcript information. Model Predictions: a. SpliceAI 10k Model Predictions: spliceai_10k_clinvar_delta.vcf Description: SpliceAI 10k delta scores for ClinVar splice variants. Contents Include: Variant Call Format (VCF) file containing delta scores that indicate the predicted impact on splicing for ClinVar variants. spliceai_10k_no_sqtl_delta.vcf Description: SpliceAI 10k delta scores for variants unlikely to be splicing quantitative trait loci (sQTLs) in Icelandic whole blood. Contents Include: VCF file with delta scores for variants not associated with sQTLs. spliceai_10k_sqtl_delta.vcf Description: SpliceAI 10k delta scores for sQTLs detected in Icelandic whole blood. Contents Include: VCF file with delta scores for variants identified as sQTLs. b. Transformer 45k Model Predictions: transformer_45k_clinvar_delta.vcf Description: Transformer 45k delta scores for ClinVar splice variants. Contents Include: VCF file with delta scores from our Transformer model, indicating the predicted impact on splicing. transformer_45k_no_sqtl_delta.vcf Description: Transformer 45k delta scores for variants unlikely to be sQTLs in Icelandic whole blood. Contents Include: VCF file with delta scores for variants not associated with sQTLs, as predicted by our model. transformer_45k_sqtl_delta.vcf Description: Transformer 45k delta scores for sQTLs detected in Icelandic whole blood. Contents Include: VCF file with delta scores for variants identified as sQTLs, based on our Transformer model predictions. Additional Information: Delta Scores and Their Interpretation: Each variant is assessed for its potential impact on splicing through four delta scores: Acceptor Site Creation (top_a_creation_delta): Predicts the likelihood of creating a new acceptor site. Acceptor Site Disruption (top_a_disruption_delta): Predicts the likelihood of disrupting an existing acceptor site. Donor Site Creation (top_d_creation_delta): Predicts the likelihood of creating a new donor site. Donor Site Disruption (top_d_disruption_delta): Predicts the likelihood of disrupting an existing donor site. Final Delta Score Calculation: The overall impact of a variant is determined by taking the maximum of these four delta scores: final_delta_score = max(top_a_creation_delta, top_a_disruption_delta, top_d_creation_delta, top_d_disruption_delta) A higher final delta score indicates a greater predicted impact on splicing. Positions: The positions (*_pos) indicate the genomic coordinates where the predicted splicing events occur, providing insight into the specific locations affected by the variant. Interpreting Delta Scores: Delta scores range from 0 to 1. Scores closer to 1 suggest a higher probability of the variant affecting splicing. Purpose: These datasets support the findings reported in our manuscript by providing the raw data used for model training and evaluation. Researchers can use these datasets to replicate our experiments, compare model performances, or conduct further studies on splice site prediction.   Citation: Please cite this dataset as: B.A. Jónsson, G.H. Halldórsson, S. Árdal, S. Rögnvaldsson, E. Einarsson, P. Sulem, D.F. Guðbjartsson, P. Melsted, K. Stefánsson, M.Ö. Úlfarsson (2023). Supplementary Data for "Transformers Significantly Improve Splice Site Prediction". Zenodo. https://doi.org/10.5281/zenodo.14109868   Contact Information: For questions or further information, please contact: Benedikt A. Jónsson Affiliation: deCODE Genetics/Amgen, Inc., Reykjavik, Iceland Email: benediktj@decode.is
创建时间:
2024-11-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作