Lineage-specific lncRNAs critically determine cross-species differences in tumors
收藏DataCite Commons2026-05-04 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19964974
下载链接
链接失效反馈官方服务:
资源简介:
Overview
This repository contains all data and code supporting the analyses presented in the manuscript. The study develops a lineage-specific lncRNA (LS lncRNA)-centered comparative pan-cancer framework integrating 9,058 RNA-seq samples from 13 human tumors and their matched mouse counterparts to systematically investigate how LS lncRNAs drive transcriptional divergence, reshape cancer hallmark landscapes, and influence the tumor immune microenvironment (TIME) across species.
The 13 tumor types analyzed include bladder, breast, kidney, liver, lung, pancreatic, prostate, ovarian, skin (melanoma), glioblastoma, glioma, leukemia, and lymphoma.
Repository Structure
.├── TranscriptionalAnalysis_code/├── TIME_analysis_code/├── eGRAM-Code/├── panData-geneExp_NX/├── panData-geneExp_logTPM/├── eGRAM-inputs/├── eGRAM-results/├── UMAP-plots/└── compare-results/
Contents Description
1. TranscriptionalAnalysis_codeScripts for transcriptomic data processing, normalization, cross-species/differential expression analysis, and module construction.
2. TIME_analysis_codeScripts for cross-species immune infiltration comparison, and immune divergence module identification.
3. eGRAM-CodeSource code for the *eGRAM* (expression-based Gene Regulatory Analysis of Modules) program, which integrates lncRNA-DNA binding site (DBS) data with expression correlation to identify transcriptional regulatory modules.
4. panData-geneExp_NXPan-cancer normalized expression (NX) datasets for all 13 human and 13 mouse tumor types.
NX values are z-scores computed from ComBat-corrected log2TPM matrices using scikit-learn `preprocessing.StandardScaler`, normalized jointly across all 13 cancer types and 11 normal tissue types within each species. These matrices serve as the primary input for cross-species differential expression analysis, TDG/TCG classification, and ANOSIM/t-SNE quality control.
5. panData-geneExp_logTPMPan-cancer log2(TPM + 1) expression datasets (post-TMM normalization and ComBat batch correction) for all 13 human and 13 mouse tumor types.
These matrices are used as input for eGRAM module identification, LS lncRNA expression quantification, and TIME deconvolution. Genes with TPM < 0.1 in > 80% of samples have been filtered.
6. eGRAM-inputsInput files for the eGRAM program.
7. eGRAM-resultsOutput files from eGRAM analysis across all 13 human and 13 mouse tumors under three conditions (Normal, Cancer, Preserved).
8. UMAP-plotsUMAP projection outputs and visualization data for hallmark landscape and target-gene landscape analyses.
9. compare-resultsResults from cross-species comparative analyses.
提供机构:
Zenodo
创建时间:
2026-05-02



