Additional file 2: of Unsupervised correction of gene-independent cell responses to CRISPR-Cas9 targeting

NIAID Data Ecosystem2026-03-10 收录

下载链接：

https://figshare.com/articles/dataset/Additional_file_2_of_Unsupervised_correction_of_gene-independent_cell_responses_to_CRISPR-Cas9_targeting/6963983

下载链接

链接失效反馈

官方服务：

资源简介：

Figure S1. CRISPR-KO screening data quality assessment. (A) Average correlation between sgRNAs read-count replicates across cell lines. (B) Receiver operating characteristic (ROC) curve obtained from classifying fitness essential (FE) and non-essential genes based on the average logFC of their targeting sgRNAs. An example cell line OVCAR-8 is shown. (C) Area under the ROC (AUROC) curve obtained for cell lines from classifying FE and non-essential genes based on the average logFC of their targeting sgRNAs. (D) Recall for sets of a priori known essential genes from MSigDB and from literature when classifying FE and non-essential genes across cell lines (5% FDR). Each circle represents a cell line and coloured by tissue type. Box and whisker plots show median, inter-quartile ranges and 95% confidence intervals. (E) Genes ranked based on the average logFC of targeting sgRNAs for OVCAR-8 and enrichment of genes belonging to predefined sets of a priori known essential genes from MSigDB, at an FDR equal to 5% when classifying FE (second last column) and non-essential genes (last column). Blue numbers at the bottom indicate the classification true positive rate (recall). Figure S2. Assessment of copy number bias before and after CRISPRcleanR correction across cell lines. sgRNA logFC values before and after CRISPRcleanR for eight cell lines are shown classified based on copy number (amplified or deleted) and expression status. Copy number segments were identified using Genomics of Drug Sensitivity in Cancer (GDSC) and Cell Line Encyclopedia (CCLE) datasets. Box and whisker plots show median, inter-quartile ranges and 95% confidence intervals. Asterisks indicate significant associations between sgRNA LogFC values (Welchs t-test, p < 0,005) and their different effect sizes accounting for the standard deviation (Cohen’s D value), compared to the whole sgRNA library. Figure S3. CN-associated effect on sgRNA logFC values in highly biased cell lines. For 3 cell lines, recall curves of non-essential genes, fitness essential genes, copy number (CN) amplified and CN amplified non-expressed genes obtained when classifying genes based on the average logFC values of their targeting sgRNAs. Figure S4. Assessment of CN-associated bias across all cell lines. LogFC values of sgRNAs averaged within segments of equal copy number (CN). One plot per cell line, with CN values at which a significant differences (Welchs t-test, p < 0.05) with respect to the logFCs corresponding to CN = 2 are initially observed (bias starting point) and start to significantly increase continuously (bias critical point). CN-associated bias is shown for all sgRNA, when excluding FE genes and histones, and for non-expressed genes only. Box and whisker plots show median, inter-quartile ranges and 95% confidence intervals. Figure S5. CRISPRcleanR correction varying the minimal number of genes required and the effect of fitness essential genes. Recall reduction of (A) amplified or (B) amplified not-expressed genes versus that of fitness essential and other prior known essential genes, when comparing CRISPRcleanR correction varying the minimal number of genes to be targeted by sgRNA in a biased segment (default parameter is n = 3). Similar results were observed when performing the analysis including or excluding known essential genes. Figure S6. CRISPRcleanR performances across 342 cell lines from an independent dataset. Recall at 5% FDR of predefined sets of genes based on their uncorrected or corrected logFCs (coordinates on the two axis) averaged across targeting sgRNAs for 342 cell lines from the Project Achilles. Figure S7. CRISPRcleanR performances in relation to data quality. The impact of data quality on recall at 5% false discovery rate (FDR) assessed following CRISPRcleanR correction for predefined set of genes. Project Achilles data (n = 342 cell lines) was binned based on the quality of uncorrected essentiality profile. This is obtained by measuring the recall at 5% FDR for predefined essential genes (from the Molecular Signature Database) and grouping the cell lines in 10 equidistant bins (1 lowest quality and 10 highest quality) when sorting them based on this value. Recall increment for fitness essential genes was greatest for the lower quality data, indicating that CRISPRcleanR can improve true signal of gene depletion in low quality data. Figure S8. Minimal impact of CRISPRcleanR on loss/gain-of-fitness effects. (A) The percentage of genes where the significance of their fitness effect (gain- or loss-of-fitness) is altered after CRISPRcleanR for Project Score and Project Achilles data. The upper row shows correction effects for all screened genes and the lower row for the subset of genes with a significant effect in the uncorrected data. Each dot is a separate cell line. Blue dots indicate the percentage of genes where significance is lost or gained post correction. Green dots indicate the percentage of genes where the fitness effect is distorted and the effect is opposite in the uncorrected data. (B) The majority of the loss-of-fitness genes impacted by correction are putative false positive effects affecting genes which are either not-expressed (FPKM < 0.5), amplified, known non-essential, or exhibit a mild phenotype in the screening data. (C) Summary of overall impact of CRISPRcleanR on fitness effects following correction when considering data for all cell lines. The colors reflect the percentage of genes with a loss-of-fitness, no phenotype or gain-of-fitness effect which are retained in the corrected data. Figure S9. CRISPRcleanR retains cancer driver gene dependencies in Project Score and Achilles data. (A) Each circle represents a tested cancer driver gene dependency (mutation or amplification of a copy number segment) and the statistical significance using MaGeCK before (x-axis) and after (y-axis) CRISPRcleanR correction, across the two screens. Plots in the first row show depletion FDR values pre/post-correction, whereas those in the second row show depletion FDR values pre-correction and enrichment FDR values post-correction. (B) Details of the tested genetic dependencies and whether they are shared before and after CRISPRcleanR correction at two different thresholds of statistical significance (5 and 10% FDR, respectively for 1st and 2nd row of plots). The third row indicates the type of alteration involving the cancer driver genes under consideration and the total number of cell lines with an alteration. (ZIP 191 kb)

创建时间：

2018-08-14