Averaging Strategy for Interpretable Machine Learning on Small Datasets to Understand Element Uptake after Seed Nanotreatment
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/Averaging_Strategy_for_Interpretable_Machine_Learning_on_Small_Datasets_to_Understand_Element_Uptake_after_Seed_Nanotreatment/23988018
下载链接
链接失效反馈官方服务:
资源简介:
Understanding plant uptake and translocation of nanomaterials
is
crucial for ensuring the successful and sustainable applications of
seed nanotreatment. Here, we collect a dataset with 280 instances
from experiments for predicting the relative metal/metalloid concentration
(RMC) in maize seedlings after seed priming by various metal and metalloid
oxide nanoparticles. To obtain unbiased predictions and explanations
on small datasets, we present an averaging strategy and add a dimension
for interpretable machine learning. The findings in post-hoc interpretations
of sophisticated LightGBM models demonstrate that solubility is highly
correlated with model performance. Surface area, concentration, zeta
potential, and hydrodynamic diameter of nanoparticles and seedling
part and relative weight of plants are dominant factors affecting
RMC, and their effects and interactions are explained. Furthermore,
self-interpretable models using the RuleFit algorithm are established
to successfully predict RMC only based on six important features identified
by post-hoc explanations. We then develop a visualization tool called
RuleGrid to depict feature effects and interactions in numerous generated
rules. Consistent parameter-RMC relationships are obtained by different
methods. This study offers a promising interpretable data-driven approach
to expand the knowledge of nanoparticle fate in plants and may profoundly
contribute to the safety-by-design of nanomaterials in agricultural
and environmental applications.
解析植物对纳米材料的吸收与转运过程,对于保障种子纳米处理技术的成功应用与可持续推广至关重要。本研究通过实验收集了280组样本数据集,用于预测经各类金属及类金属氧化物纳米颗粒引发处理后的玉米幼苗体内金属/类金属相对浓度(RMC)。为在小样本数据集上获得无偏的预测结果与可解释性分析,本研究提出了一种平均化策略,并为可解释机器学习新增了一个维度。通过对复杂的LightGBM模型进行事后可解释性分析所得结果表明,纳米材料的溶解度与模型性能高度相关。纳米颗粒的比表面积、浓度、ζ电位与流体动力学直径,以及幼苗的组织部位与植株相对重量,均为影响RMC的关键因素,本研究对这些因素的作用效应与交互关系进行了解析。此外,本研究基于事后可解释性分析所识别出的6项重要特征,构建了使用RuleFit算法的自解释模型,可精准预测RMC。随后,本研究开发了一款名为RuleGrid的可视化工具,用于可视化呈现大量生成规则中的特征效应与交互关联。通过不同分析方法均得到了一致的参数-RMC关联规律。本研究提出了一种极具应用前景的可解释性数据驱动方法,可拓展人们对纳米材料在植物体内归趋的认知,同时可显著助力纳米材料在农业与环境应用中的安全-by-设计(safety-by-design)实践。
创建时间:
2023-08-18



