Additional file 2 of Multimodal learning reveals plants’ hidden sensory integration logic

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/Additional_file_2_of_Multimodal_learning_reveals_plants_hidden_sensory_integration_logic/31877763

下载链接

链接失效反馈

官方服务：

资源简介：

Supplementary Material 2: Table S1. Gene markers used for correlation analysis with UMAP axes. Figure S1. Functional annotation of effector-associated biological processes and protein domains. Enriched terms highlight iron/manganese ion homeostasis (e.g., transmembrane transport, vacuolar sequestration) mediated by VIT family transporters, alongside ATP-dependent RNA helicase activity (DEAD/DEAH box domains). Terms are clustered by functional similarity, reflecting coordinated roles in metal trafficking and RNA metabolism during effector activity. Figure S2. Unimodal data separability and model calibration analysis. (A, B, C) Calibration curve and confidence distribution demonstrate the model’s well-calibrated predictions, with 50% of cases falling in the high-confidence range ( $$0.75-0.92$$ ) and no evidence of overconfidence. (D) Principal component analysis (PCA) of transcriptomic data shows clear separation of effector groups (GLOIN781 vs. GLOIN707) along PC1 (78.3% variance explained). (E, F) Phenomic and metabolomic profiles exhibit partial overlap between effectors (RiSP749, GLOIN781, OPF, GLOIN707), highlighting the need for multimodal integration. Figure S3. Extended analysis of phenotypic regression and embedding interpretability. (A-B) Trait-specific $$R^2$$ (MSE) scores from phenotypic regression, highlighting stronger predictability for architectural traits. Corresponding mean squared errors reveal higher uncertainty in physiological traits such as anthocyanin accumulation. Performance was evaluated by training a ridge regression model on the learned latent CoMM embeddings Z as features to predict each phenotypic trait.(C) Top 20 most important embedding dimensions for genotype classification using random forest feature importance. (D) SHAP-based interpretability of embeddings per effector class (GLOIN707, GLOIN781, RiSP749, GFP), showing feature-level specificity across dimensions. Figure S4. Cross-modal attention analysis between transcriptomic and metabolomic modalities. (A) Scatter plot comparing prior weights ( $$P_{ij}$$ ) against learned attention weights ( $$A_{ij}$$ ) for RNA-metabolite interactions, coloured by the delta value ( $$\Delta = A_{ij} - P_{ij}$$ ). Points along the identity line (dashed) indicate interactions where the model maintained prior biological knowledge, while deviations represent novel discoveries or suppressed relationships. (B) Top 15 novel RNA-metabolite discoveries ranked by delta values, showing gene-metabolite pairs where the model learned stronger associations than the baseline prior. Gene identifiers (e.g., Solyc10g000881) are paired with metabolite names, revealing potential novel biological relationships. (C) Distribution of delta values across all RNA-metabolite interactions, showing the frequency of different magnitudes of deviation from prior expectations. The vertical dashed line at $$\Delta = 0$$ indicates no change from prior. (D) Heatmap visualisation of the delta matrix for the first 50 RNA features against all 18 metabolite features, showing spatial patterns of enhanced (positive $$\Delta$$ , red) and suppressed (negative $$\Delta$$ , blue) cross-modal interactions. The analysis reveals both global patterns and specific feature-level modifications of biological priors through multimodal integration. Figure S5. Detailed Cross-Modal Integration Hub. This schematic represents an integrative analysis framework connecting three primary data modalities: (1) Metabolite profiles capturing biochemical states, (2) Gene expression patterns, and (3) Trait measurements including fractal dimension analysis, primary root length, and root swelling phenotypes. The hub facilitates the identification of multi-scale relationships between molecular components and macroscopic root architecture features, enabling comprehensive systems biology approaches to understand root development and adaptation.

补充材料2：表S1 用于与统一流形逼近与投影（UMAP）轴开展相关性分析的基因标记。图S1 效应因子相关生物学过程与蛋白质结构域的功能注释。富集的功能术语凸显了由VIT家族转运蛋白介导的铁/锰离子稳态（例如跨膜运输、液泡隔离），以及ATP依赖型RNA解旋酶活性（DEAD/DEAH盒结构域）。这些术语按照功能相似性进行聚类，反映了效应因子发挥功能时，金属转运与RNA代谢之间的协同调控作用。图S2 单模态数据可分离性与模型校准分析。（A、B、C）校准曲线与置信度分布结果显示，模型预测校准良好，50%的样本落在高置信度区间（0.75~0.92），未出现过度置信现象。（D）转录组数据的主成分分析（PCA）结果显示，效应因子组GLOIN781与GLOIN707可沿PC1轴（方差解释率78.3%）实现清晰分离。（E、F）不同效应因子（RiSP749、GLOIN781、OPF、GLOIN707）的表型组与代谢组谱图存在部分重叠，这凸显了开展多模态整合分析的必要性。图S3 表型回归与嵌入可解释性的拓展分析。（A-B）表型回归得到的性状特异性R²（均方误差，MSE）得分结果显示，植株结构性状具备更强的预测能力；对应的均方误差则表明，花青素积累等生理性状的预测不确定性更高。本分析以学习得到的潜在CoMM嵌入特征Z为输入，训练岭回归模型以预测各表型性状，以此完成模型性能评估。（C）基于随机森林特征重要性排序的前20个最重要的基因型分类嵌入维度。（D）基于SHAP（SHapley Additive exPlanations）的各效应因子类别（GLOIN707、GLOIN781、RiSP749、绿色荧光蛋白（GFP））嵌入可解释性分析，结果显示不同维度间存在特征级特异性。图S4 转录组与代谢组模态间的跨模态注意力分析。（A）针对RNA-代谢物互作的先验权重（P_ij）与学习得到的注意力权重（A_ij）绘制的散点图，以delta值（Δ = A_ij - P_ij）进行着色。落在恒等虚线（虚线）上的数据点代表模型保留了先验生物学知识的互作关系，而偏离该线的点则代表新发现的互作或被抑制的关联。（B）按delta值排序的前15个新发现RNA-代谢物互作对，展示了模型学习到的关联强度高于基线先验的基因-代谢物配对。基因标识符（例如Solyc10g000881）与代谢物名称配对，揭示了潜在的新型生物学关联。（C）所有RNA-代谢物互作的delta值分布，展示了与先验预期不同偏离程度的出现频率。Δ=0处的垂直虚线代表与先验无差异。（D）针对前50个RNA特征与全部18个代谢物特征的delta矩阵热图可视化，展示了跨模态互作增强（Δ为正，红色）与抑制（Δ为负，蓝色）的空间分布模式。本分析通过多模态整合，揭示了生物学先验知识的全局分布模式与特定特征层面的修饰情况。图S5 详细的跨模态整合枢纽示意图。该示意图展示了一个整合分析框架，连接了三类主要数据模态：（1）捕获生化状态的代谢物谱；（2）基因表达模式；（3）性状测量数据，包括分形维数分析、主根长度与根膨大表型。该枢纽有助于识别分子组分与宏观根系结构特征间的多尺度关联，为理解根系发育与适应性机制提供了系统性的综合研究途径。

创建时间：

2026-02-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集