five

FITTING Data Mining Settings for Ranking Seed Lots

收藏
DataCite Commons2023-05-13 更新2024-08-18 收录
下载链接:
https://scielo.figshare.com/articles/dataset/FITTING_Data_Mining_Settings_for_Ranking_Seed_Lots/22785544
下载链接
链接失效反馈
官方服务:
资源简介:
ABSTRACT To enhance speed and agility in interpreting physiological quality tests of seeds, The use of algorithms has emerged. This study aimed to identify suitable machine learning models to assist in the precise management of seed lot quality. Soybean lots from two companies were assessed using the Supplied Test Set, Cross-Validation (with 8, 10, and 12 folds), and Percentage Split (with 66% and 70%) methods. Variables analyzed through Tetrazolium tests included vigor, viability, mechanical damage, moisture damage, bed bug damage, and water content. Method performance was determined by Kappa, Precision, and ROC Area metrics. Classification Via Regression and J48 algorithms were employed. The technique utilizing 66% of data for training achieved 93.55% accuracy, with Precision and ROC Area reaching 94.50% for the J48 algorithm. Applying the cross-validation method with 10 folds resulted in 90.22% of correctly classified instances, with a ROC Area outcome like the previous method. Tetrazolium Vigor was the primary attribute used. However, these results are specific to this study's database, and careful planning is necessary to select the most effective application methods.

摘要 为提升种子生理品质检测结果解读的速度与灵活性,算法应用应运而生。本研究旨在筛选适配的机器学习模型,以助力种子批次品质的精细化管理。本研究采用提供的测试集、交叉验证(8折、10折及12折)以及百分比划分(66%与70%划分比例)三种方式,对两家企业的大豆批次样本进行评估。通过四唑试验(Tetrazolium Test)分析的变量包括种子活力、生活力、机械损伤、湿损、臭虫侵害以及含水量。模型性能通过Kappa系数、精确率(Precision)与ROC曲线下面积(ROC Area)三项指标进行评估。本研究采用了回归分类法与J48算法。采用66%数据作为训练集的方案准确率达93.55%,其中J48算法的精确率与ROC曲线下面积均达到94.50%。采用10折交叉验证的方案,分类正确样本占比达90.22%,其ROC曲线下面积结果与前述方案相近。四唑试验活力指标为模型所用的核心特征。但本研究结果仅适用于本次研究的数据集,在选择最优应用方案时需进行审慎规划。
提供机构:
SciELO journals
创建时间:
2023-05-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作