five

deconvMe data

收藏
DataCite Commons2025-08-20 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/methyldeconv_data/28563854/2
下载链接
链接失效反馈
官方服务:
资源简介:
We generated sample-matched readouts from PBMCs of chronic Hepatitis C virus-infected patients for DNAm using Illumina EPIC arrays, gene expression using RNA-seq, and proportions of selected immune cells using flow cytometry. This repository contains the processed TPM and count matrices for RNAseq, the beta matrix and two matrices quantifying the methylated and unmethylated probes, and a metadata table that gives information on sample IDs, gender, and flow-cytometry-derived immune cell-type fractions:<br>RNA-seq:tpm.csvcounts.csvDNAm:meth.csvunmeth.csvbeta.csvmetadata + flow cytometry:meta_flow.csv<br><br>Raw <i>fastq</i> files from the RNA sequencing have been processed using the nf-core rnaseq pipeline version 3.10. The main steps include a manual inspection of quality using <i>FastQC</i>, followed by pseudo-alignment and gene quantification using <i>Salmon</i>. Next to raw gene expression counts, this method generates transcript per million (TPM) counts, which are normalized by gene length and library size and were used as input for deconvolution methods.DNAm data by Illumina EPIC arrays generated raw <i>idat</i> files that have been processed with <i>RnBeads</i> 2.0 using the default settings during preprocessing, which removes methylation sites with too many missing values, cross-reactive probes, or those that overlap with known single-nucleotide polymorphisms (SPNs). Additionally, we removed all CpGs with NA values in any sample. The final dataset consisted of 724,082 CpGs. We removed CpGs located on sex chromosomes as they would confound the analysis.

本研究从慢性丙型肝炎病毒(hepatitis C virus, HCV)感染患者的外周血单个核细胞(peripheral blood mononuclear cell, PBMC)中获取了匹配样本的多组学检测数据:采用Illumina EPIC芯片检测DNA甲基化(DNAm)、通过RNA测序(RNA-seq)获取基因表达数据、借助流式细胞术定量选定免疫细胞的群体比例。本数据集存储库包含处理完成的RNA测序转录本每百万片段数(transcripts per million, TPM)矩阵与计数矩阵、DNA甲基化的β值矩阵、分别量化甲基化与未甲基化探针的两张矩阵,以及包含样本ID、性别与流式细胞术所得免疫细胞类型比例信息的元数据表:<br>RNA测序相关文件:tpm.csv、counts.csv<br>DNA甲基化相关文件:meth.csv、unmeth.csv、beta.csv<br>元数据与流式细胞术数据:meta_flow.csv<br><br>本研究的RNA测序原始fastq格式文件通过nf-core RNA-seq流程(版本3.10)完成预处理。主要处理步骤包括:使用FastQC开展手动质量质控,随后通过Salmon完成序列伪比对与基因定量。该方法除生成原始基因表达计数外,还会生成经基因长度与测序文库大小标准化的转录本每百万片段数(TPM),该数值可作为细胞反卷积分析的输入数据。<br><br>Illumina EPIC芯片生成的DNA甲基化原始数据为idat格式文件,本研究使用RnBeads 2.0软件并采用默认参数完成预处理,该流程会移除缺失值过多的甲基化位点、交叉反应性探针,以及与已知单核苷酸多态性(single-nucleotide polymorphisms, SPNs)重叠的探针。此外,我们移除了所有样本中存在缺失值(NA)的CpG位点。最终数据集共包含724082个CpG位点,同时移除了位于性染色体上的CpG位点,以避免其对分析结果造成干扰。
提供机构:
figshare
创建时间:
2025-08-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作