Identification of Relevant Phytochemical Constituents for Characterization and Authentication of Tomatoes by General Linear Model Linked to Automatic Interaction Detection (GLM-AID) and Artificial Neural Network Models (ANNs)

NIAID Data Ecosystem2026-03-08 收录

下载链接：

https://figshare.com/articles/dataset/_Identification_of_Relevant_Phytochemical_Constituents_for_Characterization_and_Authentication_of_Tomatoes_by_General_Linear_Model_Linked_to_Automatic_Interaction_Detection_GLM_AID_and_Artificial_Neural_Network_Models_ANNs_/1449383

下载链接

链接失效反馈

官方服务：

资源简介：

There are a large number of tomato cultivars with a wide range of morphological, chemical, nutritional and sensorial characteristics. Many factors are known to affect the nutrient content of tomato cultivars. A complete understanding of the effect of these factors would require an exhaustive experimental design, multidisciplinary scientific approach and a suitable statistical method. Some multivariate analytical techniques such as Principal Component Analysis (PCA) or Factor Analysis (FA) have been widely applied in order to search for patterns in the behaviour and reduce the dimensionality of a data set by a new set of uncorrelated latent variables. However, in some cases it is not useful to replace the original variables with these latent variables. In this study, Automatic Interaction Detection (AID) algorithm and Artificial Neural Network (ANN) models were applied as alternative to the PCA, AF and other multivariate analytical techniques in order to identify the relevant phytochemical constituents for characterization and authentication of tomatoes. To prove the feasibility of AID algorithm and ANN models to achieve the purpose of this study, both methods were applied on a data set with twenty five chemical parameters analysed on 167 tomato samples from Tenerife (Spain). Each tomato sample was defined by three factors: cultivar, agricultural practice and harvest date. General Linear Model linked to AID (GLM-AID) tree-structured was organized into 3 levels according to the number of factors. p-Coumaric acid was the compound the allowed to distinguish the tomato samples according to the day of harvest. More than one chemical parameter was necessary to distinguish among different agricultural practices and among the tomato cultivars. Several ANN models, with 25 and 10 input variables, for the prediction of cultivar, agricultural practice and harvest date, were developed. Finally, the models with 10 input variables were chosen with fit’s goodness between 44 and 100%. The lowest fits were for the cultivar classification, this low percentage suggests that other kind of chemical parameter should be used to identify tomato cultivars.

现有大量番茄栽培品种，其形态学、化学、营养及感官特性均存在广泛差异。已知诸多因素会影响番茄栽培品种的营养成分含量。要全面厘清这些因素的作用效应，需采用穷尽式实验设计、多学科科学研究范式以及适配的统计分析方法。诸如主成分分析（Principal Component Analysis, PCA）与因子分析（Factor Analysis, FA）等多元分析技术已被广泛应用，旨在挖掘数据内在模式，并通过一组全新的不相关隐变量实现数据集降维。但在部分场景中，用这类隐变量替代原始变量并不具备实用价值。本研究采用自动交互检测（Automatic Interaction Detection, AID）算法与人工神经网络（Artificial Neural Network, ANN）模型，作为主成分分析、因子分析及其他多元分析技术的替代方案，以筛选出可用于番茄表征与真伪鉴别的关键植物化学成分。为验证AID算法与ANN模型达成本研究目标的可行性，研究团队将两种方法应用于一套实验数据集：该数据集涵盖来自西班牙特内里费岛的167份番茄样本，共分析了25项化学参数。每份番茄样本均由三个因子维度定义：栽培品种、种植模式与采收日期。结合自动交互检测的广义线性模型（General Linear Model linked to AID, GLM-AID）树结构根据影响因子数量被划分为3个层级。对香豆酸（p-Coumaric acid）是可依据采收日期区分番茄样本的特征化合物。区分不同种植模式与番茄栽培品种则需用到多项化学参数。本研究构建了多组ANN模型，分别以25项与10项化学参数作为输入变量，用于预测栽培品种、种植模式与采收日期。最终选取了以10项参数为输入的模型，其拟合优度介于44%至100%之间。其中栽培品种分类的拟合优度最低，该较低的拟合比例表明，需采用其他类型的化学参数方可实现番茄栽培品种的精准鉴别。

创建时间：

2016-01-15