five

Whole exome sequencing of mouse cancer cell lines from the Mouse Cancer Cell line Atlas (MCCA).

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/ERP186434
下载链接
链接失效反馈
官方服务:
资源简介:
The Mouse Cancer Cell line Atlas (MCCA) is a comprehensive resource of 590 murine cancer cell lines derived from 81 genetically engineered, inflammation-associated or irradiation-induced mouse models of malignancy, complemented by 38 widely used public cell lines. MCCA spans 22 cell lineages and 46 cancer entities and is designed to enable comparative, mechanistic and functional studies across diverse tissues and oncogenic contexts. Whole exome sequencing (WES) of 200 cell lines from MCCA was performed using 450 ng of gDNA from murine cell lines and matched normal samples from mouse tail biopsies. Coding exons were enriched by whole-exome pull-down using the Agilent SureSelect XT Mouse All Exon Kit according to the manufacturer's instructions and sequenced on a NovaSeq 6000 (Illumina) system. The analysis of whole-exome sequencing data from mouse tumor/normal sample pairs was performed following the GATK best practice suggestions. The established MoCaSeq analysis pipeline (Lange et al. [Nat Protoc. 2020 Feb;15(2):266-315]) was used for processing all samples. Raw sequencing reads were trimmed using Trimmomatic (v0.39) (Bolger et al. [Bioinformatics. 2014 Aug 1;30(15):2114-20]), removing leading and trailing bases with Phred scores below 25 and reads with less than 50 nucleotides. In addition, an average base quality of 25 was enforced with a sliding window of 10 nucleotides for the reads. Passing reads were then aligned to the GRCm38.p6 reference genome using BWA-MEM (v0.7.17) (Li et al. [Bioinformatics. 2009 Jul 15;25(14):1754-60]) with default settings. The mapped reads were processed with samblaster (v0.1.26) (Faust et al. [Bioinformatics. 2014 Sep 1;30(17):2503-5]), sambamba (v0.7.0) (Tarasov et al. [Bioinformatics. 2015 Jun 15;31(12):2032-4]) and Picard tools (v2.20.0) [http://broadinstitute.github.io/picard]. Mutect2 from the GATK toolkit (v4.2.0.0) was used to call indels and somatic mutations with default settings. Variants were filtered for read orientation artifacts using GATK. For each tumor sample, the corresponding “normal” sample was used to filter germline variants. Additionally, candidate somatic mutations were filtered for SNPs by excluding variants listed in the Wellcome Trust Sanger Mouse Genome Project SNP database (v5) (ENA study PRJEB11471). Furthermore, somatic mutations were filtered if i) the read coverage was below 5 in both the control and tumor, ii) the variant allele frequency was below 5%, and iii) the number of reads carrying the variant was below 2 in the tumor sample and equal to 1 or 0 in the “normal” sample. Annotation of somatic variants was performed with SNPeff (v4.3) (Cingolani et al. [Fly (Austin). 2012 Apr-Jun;6(2):80-92]). SNVs with a low predicted impact as well as variants at non-exonic sites were excluded from further analysis. DNA tumor/normal copy ratios were determined using CNVKit (v0.9.9) (Talevich et al. [PLoS Comput Biol. 2016 Apr 21;12(4):e1004873]). The copy number calling was performed using the “batch” command of the CNVKit pipeline for read coverage estimation, normalization and segmentation. The probe regions of the Agilent SureSelect XT Mouse All Exon Kit were used as on-target regions.
创建时间:
2026-01-01
二维码
社区交流群
二维码
科研交流群
商业服务