five

Averaging Strategy for Interpretable Machine Learning on Small Datasets to Understand Element Uptake after Seed Nanotreatment

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/Averaging_Strategy_for_Interpretable_Machine_Learning_on_Small_Datasets_to_Understand_Element_Uptake_after_Seed_Nanotreatment/23988018
下载链接
链接失效反馈
官方服务:
资源简介:
Understanding plant uptake and translocation of nanomaterials is crucial for ensuring the successful and sustainable applications of seed nanotreatment. Here, we collect a dataset with 280 instances from experiments for predicting the relative metal/metalloid concentration (RMC) in maize seedlings after seed priming by various metal and metalloid oxide nanoparticles. To obtain unbiased predictions and explanations on small datasets, we present an averaging strategy and add a dimension for interpretable machine learning. The findings in post-hoc interpretations of sophisticated LightGBM models demonstrate that solubility is highly correlated with model performance. Surface area, concentration, zeta potential, and hydrodynamic diameter of nanoparticles and seedling part and relative weight of plants are dominant factors affecting RMC, and their effects and interactions are explained. Furthermore, self-interpretable models using the RuleFit algorithm are established to successfully predict RMC only based on six important features identified by post-hoc explanations. We then develop a visualization tool called RuleGrid to depict feature effects and interactions in numerous generated rules. Consistent parameter-RMC relationships are obtained by different methods. This study offers a promising interpretable data-driven approach to expand the knowledge of nanoparticle fate in plants and may profoundly contribute to the safety-by-design of nanomaterials in agricultural and environmental applications.

解析植物对纳米材料的吸收与转运过程,对于保障种子纳米处理技术的成功应用与可持续推广至关重要。本研究通过实验收集了280组样本数据集,用于预测经各类金属及类金属氧化物纳米颗粒引发处理后的玉米幼苗体内金属/类金属相对浓度(RMC)。为在小样本数据集上获得无偏的预测结果与可解释性分析,本研究提出了一种平均化策略,并为可解释机器学习新增了一个维度。通过对复杂的LightGBM模型进行事后可解释性分析所得结果表明,纳米材料的溶解度与模型性能高度相关。纳米颗粒的比表面积、浓度、ζ电位与流体动力学直径,以及幼苗的组织部位与植株相对重量,均为影响RMC的关键因素,本研究对这些因素的作用效应与交互关系进行了解析。此外,本研究基于事后可解释性分析所识别出的6项重要特征,构建了使用RuleFit算法的自解释模型,可精准预测RMC。随后,本研究开发了一款名为RuleGrid的可视化工具,用于可视化呈现大量生成规则中的特征效应与交互关联。通过不同分析方法均得到了一致的参数-RMC关联规律。本研究提出了一种极具应用前景的可解释性数据驱动方法,可拓展人们对纳米材料在植物体内归趋的认知,同时可显著助力纳米材料在农业与环境应用中的安全-by-设计(safety-by-design)实践。
创建时间:
2023-08-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作