Additional file 1 of Integrative metagenomic, transcriptomic, and proteomic analysis reveal the microbiota-host interplay in early-stage lung adenocarcinoma among non-smokers

Name: Additional file 1 of Integrative metagenomic, transcriptomic, and proteomic analysis reveal the microbiota-host interplay in early-stage lung adenocarcinoma among non-smokers
Creator: figshare
Published: 2025-06-25 03:26:50
License: 暂无描述

DataCite Commons2025-06-25 更新2024-08-19 收录

下载链接：

https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Integrative_metagenomic_transcriptomic_and_proteomic_analysis_reveal_the_microbiota-host_interplay_in_early-stage_lung_adenocarcinoma_among_non-smokers/26289146

下载链接

链接失效反馈

官方服务：

资源简介：

Additional file 1: Figure S1. Quality control of metagenomic data. A Comparison of quality control passed reads after filtering in 129 samples. B Specaccum species accumulation curves of three groups. C Rarefaction curves showing observed species richness taken from the 129 samples. D Overall taxa distribution of the microbiome kingdom in three groups. Figure S2. Microbial compositions in the cohort. Microbial compositions of the patients with ESLUAD and HCs at the phylum (A), genus (B), and species (C) levels. The top 10/20 abundant microbial taxa are shown with different gradient colors. The microbial composition is arranged in order of the mostabundant taxonomic ranks. Figure S3. Representative microbes exhibiting significant alterations between patients with ESLUAD and HCs. *** p < 0.001 as determined by Kruskal–Wallis test. Figure S4. Correlation between intrapulmonary microbiota and clinical features. A, D Comparison of the alpha diversity (Chao1/Shannon/Simpson index) and beta diversity (Bray–Curtis distance) at the species level with tumor infiltration in patients with ES-LUAD. B, E Comparison of the alpha diversity (Chao1/Shannon/Simpson index) and beta diversity (Bray–Curtis distance) at the species level with solid component of tumor in patients with ES-LUAD. C, F Comparison of the alpha diversity (Chao1/Shannon/Simpson index) and beta diversity (Bray–Curtis distance) at the species level with multiple-primary nodules in patients with ES-LUAD. Box plots show median ± quartiles, and the whiskers extend from the hinge to the largest or smallest value no further than 1.5-fold of the interquartile range. ns: Not significant, p-value as determined by Wilcoxon rank-sum test. AIS: Adenocarcinoma in situ, MIA: Minimally invasive adenocarcinoma, IA: Invasive adenocarcinoma, pGGN: Pure ground glass nodules, mGGN: Mixed ground glass nodules, SN: Solid nodule. Figure S5. Overview of transcriptome data. A RNA-Seq passed reads sequenced by Illumina NoveSeq 6000 Nanopore platforms (Wilcoxon rank-sum test). B Clustering heatmap of the DEGs between patients with ES-LUAD and HCs (DESeq2, |log2FC| > 1). C PCA analysis reveals differences in the transcriptomes of patients with ES-LUAD and HCs. D Volcano diagram shows the significant DEGs between patients with ES-LUAD and HCs (DESeq2, |log2FC| > 1). Figure S6. Identification of ES-LUAD-related mRNAs in the transcriptome dataset through WGCNA. A–D Network fitting calculations with fitted curves for selected network construction parameters. A Correlation coefficient corresponding to different power. B Average connectivity of the network constructed with different power values. When the power is taken as 8, the correlation coefficient is higher, and the average connectivity of the network is also higher, so the value of power used in the construction of the subsequent module is 8. C The distribution of network connectivity when the power is 8; D The test result of the power law distribution. As can be seen from the figure, k and p(k) are negatively correlated (correlation coefficient: 0.85), indicating that the selected power value enables the establishment of a scale-free network of genes. E The result of weighted co-expression network construction. F Heatmap of correlation analysis between modules and clinical traits. G Gene expression information statistics within modules. Figure S7. Overview of proteomic data. A QC sample correlation represents the process stability. B Clustering heatmap of the DEPs between patients with ES-LUAD and HCs (Wilcoxon rank-sum test, log2 fold change > 1). C PCA reveals differences in the proteome of patients with ES-LUAD and HCs. D Volcano diagram shows the significant DEPs between patients with ES-LUAD and HCs (Wilcoxon rank-sum test, log2 fold change > 1). Figure S8. Validation and prognostic information of DEGs and DEPs in public databases. The top represents the expression of DEGs, and the bottom represents the expression of DEPs. The middle represents the OS and DFS. Solid lines indicate significance at p < 0.05 (Mantel–Cox test). Figure S9. Random forest model based on multi-omics data. A The left panel represents the validation queue ROC curve for the random forest model established based on 3000 proteins (training AUC = 1). The middle panel depicts the selection of optimal feature count based on 10-fold cross-validation. The right panel shows the ROC curve for the top 150 proteins in the validation queue (training AUC = 1). B The left panel represents the validation queue ROC curve for the random forest model established based on 13846 mRNAs (training AUC = 1). The middle panel depicts the selection of optimal feature count based on 10-fold cross-validation. The right panel shows the ROC curve for the top 500 mRNAs in the validation queue (training AUC = 1). C The left panel represents the validation queue ROC curve for the random forest model established based on 196 KO genes (training AUC = 1). The middle panel depicts the selection of optimal feature count based on 10-fold cross-validation. D The left panel represents the validation queue ROC curve for the random forest model established based on 398 microbes and 3000 proteins (training AUC = 1). The middle panel depicts the selection of optimal feature count based on 10-fold cross-validation. The right panel shows the ROC curve for the top 45 microbes and proteins in the validation queue (training AUC = 1).

附加文件1：补充图S1 宏基因组数据（metagenomic data）质控。A：129份样本经过滤后通过质控的读长对比分析。B：基于Specaccum分析的三组物种积累曲线（species accumulation curves）。C：基于129份样本绘制的展示观测物种丰富度的稀疏曲线（rarefaction curves）。D：三组微生物组界的总分类单元分布情况。补充图S2 队列中的微生物组构成。早期肺腺癌（ES-LUAD）患者与健康对照（Healthy Controls, HCs）在门（phylum）、属（genus）、种（species）水平的微生物组构成。选取丰度排名前10/20的微生物分类单元，以不同渐变色彩展示。微生物组构成按丰度最高的分类阶元排序。补充图S3 早期肺腺癌（ES-LUAD）患者与健康对照（HCs）间存在显著差异的代表性微生物。经克鲁斯卡尔-沃利斯检验（Kruskal–Wallis test）得到*** p < 0.001。补充图S4 肺内微生物组与临床特征的相关性。A、D：物种水平的α多样性（alpha diversity，含Chao1、香农Shannon、辛普森Simpson指数）与β多样性（beta diversity，布赖斯-柯蒂斯距离Bray–Curtis distance）与ES-LUAD患者肿瘤浸润的对比分析。B、E：物种水平的α多样性与β多样性与ES-LUAD患者肿瘤实体成分的对比分析。C、F：物种水平的α多样性与β多样性与ES-LUAD患者多原发结节的对比分析。箱线图展示中位数±四分位数，须线从箱体延伸至不超过四分位距1.5倍的最大或最小值。ns：无显著性差异，p值经威尔科克森秩和检验（Wilcoxon rank-sum test）得到。AIS：原位腺癌（Adenocarcinoma in situ），MIA：微浸润腺癌（Minimally invasive adenocarcinoma），IA：浸润性腺癌（Invasive adenocarcinoma），pGGN：纯磨玻璃结节（Pure ground glass nodules），mGGN：混杂磨玻璃结节（Mixed ground glass nodules），SN：实性结节（Solid nodule）。补充图S5 转录组数据概览。A：经Illumina NovaSeq 6000纳米孔测序平台（Illumina NoveSeq 6000 Nanopore platforms）测序的RNA测序（RNA-Seq）合格读长（威尔科克森秩和检验）。B：ES-LUAD患者与健康对照（HCs）间差异表达基因（Differentially Expressed Genes, DEGs）的聚类热图（DESeq2，|log₂倍数变化|>1）。C：主成分分析（Principal Component Analysis, PCA）显示ES-LUAD患者与健康对照（HCs）的转录组存在差异。D：火山图（volcano diagram）展示ES-LUAD患者与健康对照（HCs）间的显著差异表达基因（DESeq2，|log₂倍数变化|>1）。补充图S6 基于加权基因共表达网络分析（Weighted Gene Co-expression Network Analysis, WGCNA）在转录组数据集中鉴定ES-LUAD相关mRNA。A~D：针对筛选的网络构建参数进行网络拟合计算并绘制拟合曲线。A：不同软阈值（power）对应的相关系数。B：不同软阈值下构建的网络的平均连通性。当软阈值取8时，相关系数与网络平均连通性均较高，因此后续模块构建选用的软阈值为8。C：软阈值为8时的网络连通性分布；D：幂律分布检验结果。由图可知，k与p(k)呈负相关（相关系数：0.85），表明所选软阈值可构建基因无尺度网络（scale-free network）。E：加权共表达网络构建结果。F：模块与临床性状的相关性分析热图。G：模块内基因表达信息统计。补充图S7 蛋白质组数据概览。A：质控样本相关性体现实验流程稳定性。B：ES-LUAD患者与健康对照（HCs）间差异表达蛋白（Differentially Expressed Proteins, DEPs）的聚类热图（威尔科克森秩和检验，log₂倍数变化>1）。C：主成分分析显示ES-LUAD患者与健康对照（HCs）的蛋白质组存在差异。D：火山图展示ES-LUAD患者与健康对照（HCs）间的显著差异表达蛋白（威尔科克森秩和检验，log₂倍数变化>1）。补充图S8 公共数据库中差异表达基因（DEGs）与差异表达蛋白（DEPs）的验证及预后信息。上方为DEGs的表达情况，下方为DEPs的表达情况，中间为总生存期（Overall Survival, OS）与无病生存期（Disease-Free Survival, DFS）。实线代表p<0.05具有显著性（Mantel–Cox test）。补充图S9 基于多组学数据的随机森林模型（random forest model）。A：左图为基于3000个蛋白构建的随机森林模型的验证队列受试者工作特征曲线（Receiver Operating Characteristic curve, ROC curve）（训练集曲线下面积（Area Under Curve, AUC）=1）；中图为基于十折交叉验证（10-fold cross-validation）筛选最优特征数的结果；右图为验证队列中丰度排名前150的蛋白的ROC曲线（训练集AUC=1）。B：左图为基于13846个mRNA构建的随机森林模型的验证队列ROC曲线（训练集AUC=1）；中图为基于十折交叉验证筛选最优特征数的结果；右图为验证队列中丰度排名前500的mRNA的ROC曲线（训练集AUC=1）。C：左图为基于196个敲除基因（KO genes）构建的随机森林模型的验证队列ROC曲线（训练集AUC=1）；中图为基于十折交叉验证筛选最优特征数的结果。D：左图为基于398个微生物与3000个蛋白构建的随机森林模型的验证队列ROC曲线（训练集AUC=1）；中图为基于十折交叉验证筛选最优特征数的结果；右图为验证队列中丰度排名前45的微生物与蛋白的ROC曲线（训练集AUC=1）。

提供机构：

figshare

创建时间：

2024-07-13