Data for: Wild blueberry yield prediction using a combination of computer simulation and machine learning algorithms
收藏DataCite Commons2025-05-01 更新2025-05-17 收录
下载链接:
https://data.mendeley.com/datasets/p5hvjzsvn8
下载链接
链接失效反馈官方服务:
资源简介:
A number of research is underway in the agricultural sector to better predict crop yield using machine learning algorithms. Many machine learning algorithms require large amounts of data in order to give useful results. One of the major challenges in training and experimenting with machine learning algorithms is the availability of training data in sufficient quality and quantity remains a limiting factor. In the paper, “Wild blueberry yield prediction using a combination of computer simulation and machine learning algorithms”, we used dataset generated by the Wild Blueberry Pollination Model, a spatially explicit simulation model validated by field observation and experimental data collected in Maine USA during the last 30 years. The blueberry yields predictive models require data that sufficiently characterize the influence of plant spatial traits, bee species composition, and weather conditions on production. In a multi-step process, we designed simulation experiments and conducted the runs on the calibrated version of the blueberry simulation model. The simulated dataset was then examined, and important features were selected to build four machine-learning-based predictive models. This simulated data provides researchers who have actual data collected from field observation and those who wants to experiment the potential of machine learning algorithms response to real data and computer simulation modelling generated data as input for crop yield prediction models.
当前农业领域正开展多项研究,旨在借助机器学习算法实现作物产量的精准预测。多数机器学习算法需依托大规模数据方能输出可靠结果。在机器学习算法的训练与实验过程中,核心挑战之一便是难以获取质量与数量均达标的训练数据,这已成为制约研究推进的关键瓶颈。在题为《结合计算机模拟与机器学习算法的野生蓝莓产量预测》的论文中,我们采用了由野生蓝莓授粉模型(Wild Blueberry Pollination Model)生成的数据集。该模型是一款经美国缅因州近30年间野外观测与采集的实验数据验证的空间显式模拟模型(spatially explicit simulation model)。蓝莓产量预测模型需要能够充分表征植物空间特征、蜂类物种组成以及气象条件对产量影响的数据集。我们通过多阶段流程设计了模拟实验,并在经过校准的蓝莓模拟模型版本上完成了模拟运行。随后对生成的模拟数据集进行分析筛选,提取关键特征以构建四个基于机器学习的预测模型。本模拟数据集可为两类研究者提供支持:一类是已拥有野外观测实际数据的研究者,另一类是希望探索机器学习算法在以真实数据与计算机模拟生成数据作为作物产量预测模型输入时的表现潜力的研究者。
提供机构:
Mendeley
创建时间:
2020-09-12



