Lineage-specific lncRNAs critically determine cross-species differences in tumors
收藏DataCite Commons2026-05-04 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19964973
下载链接
链接失效反馈官方服务:
资源简介:
Overview
This repository contains all data and code supporting the analyses presented in the manuscript. The study develops a lineage-specific lncRNA (LS lncRNA)-centered comparative pan-cancer framework integrating 9,058 RNA-seq samples from 13 human tumors and their matched mouse counterparts to systematically investigate how LS lncRNAs drive transcriptional divergence, reshape cancer hallmark landscapes, and influence the tumor immune microenvironment (TIME) across species.
The 13 tumor types analyzed include bladder, breast, kidney, liver, lung, pancreatic, prostate, ovarian, skin (melanoma), glioblastoma, glioma, leukemia, and lymphoma.
Repository Structure
.├── TranscriptionalAnalysis_code/├── TIME_analysis_code/├── eGRAM-Code/├── panData-geneExp_NX/├── panData-geneExp_logTPM/├── eGRAM-inputs/├── eGRAM-results/├── UMAP-plots/└── compare-results/
Contents Description
1. TranscriptionalAnalysis_codeScripts for transcriptomic data processing, normalization, cross-species/differential expression analysis, and module construction.
2. TIME_analysis_codeScripts for cross-species immune infiltration comparison, and immune divergence module identification.
3. eGRAM-CodeSource code for the *eGRAM* (expression-based Gene Regulatory Analysis of Modules) program, which integrates lncRNA-DNA binding site (DBS) data with expression correlation to identify transcriptional regulatory modules.
4. panData-geneExp_NXPan-cancer normalized expression (NX) datasets for all 13 human and 13 mouse tumor types.
NX values are z-scores computed from ComBat-corrected log2TPM matrices using scikit-learn `preprocessing.StandardScaler`, normalized jointly across all 13 cancer types and 11 normal tissue types within each species. These matrices serve as the primary input for cross-species differential expression analysis, TDG/TCG classification, and ANOSIM/t-SNE quality control.
5. panData-geneExp_logTPMPan-cancer log2(TPM + 1) expression datasets (post-TMM normalization and ComBat batch correction) for all 13 human and 13 mouse tumor types.
These matrices are used as input for eGRAM module identification, LS lncRNA expression quantification, and TIME deconvolution. Genes with TPM < 0.1 in > 80% of samples have been filtered.
6. eGRAM-inputsInput files for the eGRAM program.
7. eGRAM-resultsOutput files from eGRAM analysis across all 13 human and 13 mouse tumors under three conditions (Normal, Cancer, Preserved).
8. UMAP-plotsUMAP projection outputs and visualization data for hallmark landscape and target-gene landscape analyses.
9. compare-resultsResults from cross-species comparative analyses.
### 研究概况
本研究库包含支撑手稿中所述分析的全部数据与代码。本研究构建了以谱系特异性长链非编码RNA(lineage-specific lncRNA, LS lncRNA)为核心的跨物种泛癌比较分析框架,整合了来自13种人类肿瘤及其匹配小鼠对应样本的9058个RNA测序(RNA-seq)样本,以系统探究谱系特异性长链非编码RNA如何驱动跨物种的转录分化、重塑癌症特征谱,并影响肿瘤免疫微环境(tumor immune microenvironment, TIME)。
本次分析涵盖的13种肿瘤类型包括:膀胱癌、乳腺癌、肾癌、肝癌、肺癌、胰腺癌、前列腺癌、卵巢癌、皮肤癌(黑色素瘤)、胶质母细胞瘤、胶质瘤、白血病及淋巴瘤。
### 文件库结构
.├── TranscriptionalAnalysis_code/
├── TIME_analysis_code/
├── eGRAM-Code/
├── panData-geneExp_NX/
├── panData-geneExp_logTPM/
├── eGRAM-inputs/
├── eGRAM-results/
├── UMAP-plots/
└── compare-results/
### 内容说明
1. **TranscriptionalAnalysis_code**:用于转录组数据处理、标准化、跨物种/差异表达分析及模块构建的脚本文件。
2. **TIME_analysis_code**:用于跨物种免疫浸润比较及免疫分化模块识别的脚本文件。
3. **eGRAM-Code**:*eGRAM*(基于表达的模块基因调控分析,expression-based Gene Regulatory Analysis of Modules)程序的源代码,该工具整合长链非编码RNA-DNA结合位点(lncRNA-DNA binding site, DBS)数据与表达相关性,以识别转录调控模块。
4. **panData-geneExp_NX**:涵盖13种人类及13种小鼠肿瘤类型的泛癌标准化表达(NX)数据集。
NX值为通过scikit-learn的`preprocessing.StandardScaler`对ComBat校正后的log2TPM矩阵计算得到的z得分,在每个物种的13种癌症类型与11种正常组织类型中联合标准化。此类矩阵可作为跨物种差异表达分析、TDG/TCG分类及ANOSIM(相似性分析,analysis of similarities)、t-SNE(t分布邻域嵌入,t-distributed Stochastic Neighbor Embedding)质量控制的核心输入。
5. **panData-geneExp_logTPM**:涵盖13种人类及13种小鼠肿瘤类型的泛癌log2(TPM + 1)表达数据集,该数据集经TMM(修剪均值法,trimmed mean of M-values)标准化及ComBat批次校正。
此类矩阵可作为eGRAM模块识别、长链非编码RNA表达定量及肿瘤免疫微环境解卷积的输入数据。已过滤掉在超过80%的样本中TPM值小于0.1的基因。
6. **eGRAM-inputs**:eGRAM程序的输入文件。
7. **eGRAM-results**:13种人类及13种小鼠肿瘤在三种条件(正常、癌症、保守型)下的eGRAM分析输出文件。
8. **UMAP-plots**:UMAP(一致流形近似与投影,Uniform Manifold Approximation and Projection)投影输出及癌症特征谱、靶基因谱分析的可视化数据。
9. **compare-results**:跨物种比较分析的结果。
提供机构:
Zenodo
创建时间:
2026-05-02



