Germline and somatic eQTL summary statistics from the analysis of MMRF CoMMpass Multiple Myeloma tumor samples. ATACseq of Multiple Myeloma cell lines.
收藏DataCite Commons2025-04-01 更新2024-08-26 收录
下载链接:
https://figshare.com/articles/dataset/Germline_and_somatic_eQTL_summary_statistics_from_the_analysis_of_MMRF_CoMMpass_Multiple_Myeloma_tumor_samples_ATACseq_of_Multiple_Myeloma_cell_lines_/26307886/1
下载链接
链接失效反馈官方服务:
资源简介:
Peripheral blood whole genome sequence and tumor whole transcriptome sequence data were available for 607 donors. Tumor whole genome sequence data was available for 519 donors.Germline variants were called by first producing per-sample raw genotype-likelihoods using GATK HaplotypeCaller with a minimum base quality score 10, and then joint genotyping all the per-sample gVCFs using GenotypeGVCFs.Imputation of missing genotypes was performed in 5Mb intervals across chromosomes with IMPUTE2, using the 1000 Genomes Phase 3 multiethnic reference panel. Imputation of the X-chromosomal alleles was carried out by treating the pseudoautosomal regions (PAR1 and PAR2) as autosomal and adding the --chrx flag for the imputation for the X-linked region. Genotype data were filtered to retain genotypes with probability scores ≥0.9 and variants with minor allele frequency (MAF) >0.05 and minor allele count (MAC) >10. Somatic mutations were called from paired tumor and normal samples using Mutect.Whole genome and RNA short sequence reads were aligned to a custom reference genome containing the human reference hs37d5, human ribosomal complete repeating unit sequence, genome sequences of 14 oncoviruses, and FASTA sequences of ERCC RNA spike-in mixes.WGS reads were aligned using BWA-MEM v.0.7.8. BAM files were processed with Picard’s MarkDuplicates (http://broadinstitute.github.io/picard) and GATK IndelRealigner with default settings. RNAseq reads were aligned using STAR v.2.3.1z_r395 11 with the following parameter settings: outFilterMismatchNmax 10, outFilterMismatchNoverLmax 0.1, alignIntronMin 20, alignIntronMax 1000000, alignMatesGapMax 1000000, alignSJoverhangMin 8, alignSJDBoverhangMin 1, seedSearchStartLmax 30, chimSegmentMin 15, chimJunctionOverhangMin 15. Read counts were quantified using HTseq.Gene expression data were filtered and normalized for eQTL analyses. Genes with FPKM >0.1 and a read count of >6 in at least 50 samples were considered expressed and included in the analysis. The distributions of FPKM in each sample and gene were transformed to the quantiles of the standard normal distribution. We used linear regression to model the expression levels of each gene as a function of the copy number state of each individual in the said gene. The residuals of these models were used as the final expression data. CNA data was available for 574 individuals and 25,553 of the 25,602 tested genes. The copy-number state of the remaining individuals/genes was set to 0.100 PEER factors were included as covariates to account for technical batch effects in the expression data (Supplementary Materials). Sex, as well as five genotype Principal Components (PCs) were included as covariates to account for population structure.Germline variant effects on tumor gene expression were identified by linear regression as implemented in QTLtools 18. Variants within 1Mb of the gene under investigation were considered for testing. p-values of top-associations adjusted for the number of variants tested in cis were obtained using 10,000 permutations. False discovery rate (FDR) adjusted p-values were calculated to adjust for multiple phenotypes tested. Significant associations were selected using an FDR adjusted p-value threshold of 0.01. To identify instances of differential eQTL effects between the sexes, we identified cis-acting germline eQTLs in males and females separately, as well as in the joint analysis of both sexes.Single nucleotide mutations in 520 tumor samples were annotated using SnpEff. To identify non-deleterious, non-coding, putatively regulatory mutations, the data was filtered to exclude mutations that alter the gene product by retaining variants with a SnpEff effect impact classification “MODIFIER”. For somatic eQTL mapping, clusters of mutations within 50bp were binned to identify recurrently mutated loci. Loci with mutations in fewer than 5 tumor samples were removed from further analyses. The center position of each bin was used in the eQTL analysis. Associations between mutations within a 50kb cis-window of TSSs and target gene expression levels were obtained with QTLtools using the nominal pass. Significant associations were selected based on an FDR-adjusted p-value threshold 0.2.We used open chromatin profiling of MM cell lines to characterize the chromatin landscape in MM. The MM cell lines used in this study are male cell lines OCI-My5 and LP-1 and female cell lines KMS20 and AMO1. Cells were cultured at 37oC with 5% CO2 in advanced RPMI 1640 (Gibco) supplemented + Glutamax supplemented with 4% Fetal Bovine Serum and 1% penicillin-streptomycin. HEK293T cells used for lentiviral production were cultured at 37oC with 5% CO2 in DMEM supplemented with 4 mM L-Glutamine, 10% Fetal Bovine Serum and 1% penicillin-streptomycin.For ATAC-seq, transposition and library construction were performed as described in doi.org/10.1038/nmeth.4396. 100,000 cells from cell cultures with high viability (above 90%) were treated with DNAse (Worthington Cat# LS002007) at a concentration of 200U/ml in culture medium at 370C for 30 minutes. Cells were washed thoroughly 3 times with 1X PBS to remove DNAse completely, and the cell pellet was collected by centrifuging at 500 RCF at 40C for 5 minutes. Cell lysis was performed for 3 minutes on ice using 50 𝜇l of cold ATAC-resuspension buffer (RSB) containing 0.1% NP40, 0.1% Tween-20 and 0.01% Digitonin. Lysis buffer was washed out with 1ml of cold ATAC-RSB containing 0.1% Tween-20 but no NP40 or digitonin, and the nuclei pellet was collected by centrifuging at 500 RCF at 40C for 10 minutes. Transposition reaction was performed using 25 𝜇l 2X TD buffer, 2.5 𝜇l transposase (Illumina) for 30 minutes at 370C in a thermocycler with 1000 RPM mixing. Transposed fragments were cleaned up with Zymo DNA Clean and Concentrator-5 kit (Zymo Research, Cat# D4014) and pre-amplification of transposed libraries was performed with 2X NEBNext Master Mix (NEB, Cat# ) using the following program: 720C for 5 min, 980C for 30 sec, and 5 cycles of (980C for 10 sec, 630C for 30 sec, 720C for 1 min) with the corresponding primers. 5 𝜇l of pre-amplification products were used in a qPCR reaction to determine the additional cycles needed and transposition libraries were purified using Zymo DNA Clean and Concentrator-5 kit (Zymo Research, Cat# D4014). Library QC was assayed on Agilent 2200 TapeStation using D5000 high sensitivity tape and library quantification was performed on Qubit prior to sequencing on NextSeq High 500/550 platform (Illumina).ATAC-seq reads were mapped to the custom human reference genome using bowtie2 v. 2.3.034. SAM files were converted to BAM with samtools and coordinates sorted with the Picard toolkit’s SortSam. BAM files of the same samples from different lanes were merged and mitochondrial reads were removed using samtools. Optical duplicates were removed with Picard toolkit’s MarkDuplicates. Fragment distribution statistics were collected with Picard toolkit’s CollectInsertSizeMetrics. Peak calling was performed using HMMRATAC with default parameters. Peak annotation and plots were generated using the R package ChIPseeker.QTLtools permutation pass columns:The phenotype IDThe chromosome ID of the phenotypeThe start position of the phenotypeThe end position of the phenotypeThe strand orientation of the phenotypeThe total number of variants tested in cisThe distance between the phenotype and the tested variant (accounting for strand orientation)The ID of the top variantThe chromosome ID of the top variantThe start position of the top variantThe end position of the top variantThe number of degrees of freedom used to compute the P-valuesDummyThe first parameter value of the fitted beta distributionThe second parameter value of the fitted beta distribution (it also gives the effective number of independent tests in the region)The nominal P-value of association between the phenotype and the top variant in cisThe corresponding regression slopeThe P-value of association adjusted for the number of variants tested in cis given by the direct method (i.e. empirircal P-value)The P-value of association adjusted for the number of variants tested in cis given by the fitted beta distribution. We strongly recommend to use this adjusted P-value in any downstream analysisQTLtools nominal pass columns:The phenotype ID (and in the somatic eQTL file, gene symbol)The chromosome ID of the phenotypeThe start position of the phenotypeThe end position of the phenotypeThe strand orientation of the phenotypeThe total number of variants tested in cisThe distance between the phenotype and the tested variant (accounting for strand orientation)The ID of the tested variantThe chromosome ID of the variantThe start position of the variantThe end position of the variantThe nominal P-value of association between the variant and the phenotypeThe corresponding regression slopeA binary flag equal to 1 is the variant is the top variant in cis
提供机构:
figshare
创建时间:
2024-07-21



