X-CRISP: Domain-Adaptable and Interpretable CRISPR Repair Outcome Prediction - Preprocessed CRISPR repair outcomes and train/test target splits
收藏Figshare2025-06-18 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/X-CRISP_Domain-Adaptable_and_Interpretable_CRISPR_Repair_Outcome_Prediction_-_Preprocessed_CRISPR_repair_outcomes_and_train_test_target_splits/29260037
下载链接
链接失效反馈官方服务:
资源简介:
<b>Motivation:</b> Controlling the outcomes of CRISPR editing is crucial for the success of gene therapy. Since donor template-based editing is often inefficient, alternative strategies have emerged that leverage mutagenic end-joining repair instead. Existing machine learning models can accurately predict end-joining repair outcomes, however: generalisability beyond the specific cell line used for training remains a challenge, and interpretability is typically limited by suboptimal feature representation and model architecture.<br><b>Results:</b> We propose X-CRISP, a flexible and interpretable neural network for predicting repair outcome frequencies based on a minimal set of outcome and sequence features, including microhomologies (MH). Outperforming prior models on detailed and aggregate outcome predictions, X-CRISP prioritised MH location over MH sequence properties such as GC content for deletion outcomes. Through transfer learning, we adapted X-CRISP pre-trained on wild-type mESC data to target human cell lines K562, HAP1, U2OS, and mESC lines with altered DNA repair function. Adapted X-CRISP models improved over direct training on target data from as few as 50 samples, suggesting that this strategy could be leveraged to build models for new domains using a fraction of the data required to train models from scratch.<b>Article DOI: </b>https://doi.org/10.1101/2025.02.06.636858<b>Details:</b> The repository contains fasta files detailing the target sequences used for training and testing. Entry in the fasta file describes the target_name, index of the PAM sequence, and the orientation of the sequence (FORWARD/REVERSE). This repository also contains all the processed CRISPR mutational outcomes after the "Data and pre-processing" steps were completed. There is one zip file per cell type. The zip files contain tsv files for each processed target site with the headers:<br>Type: INSERTION/DELETIONSize: Insertion/deletion sizeStart: Start locationInsSeq: Inserted sequence length (if insertion)homologyLength: Homology length (if deletion)countEvents: Repair outcome counts<br><br>
提供机构:
Seale, Colm
创建时间:
2025-06-18



