Input data for winter wheat yield forecasting
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/rbs52ww8v2
下载链接
链接失效反馈官方服务:
资源简介:
Data Description for County-Scale Winter Wheat Yield Prediction in Eastern China (2005-2022)
This dataset, provided as a single CSV file (winter_wheat_yield_prediction_data.csv), compiles comprehensive input features for modeling winter wheat yield across 77 counties in eastern China from 2005 to 2022. Each row represents a unique county-year observation. Data was meticulously gathered:
• County-level winter wheat yield (ton/ha): From official statistical yearbooks.
• Climate variables: Growing season (Oct-May) averages/totals for temperature, precipitation, and solar radiation from TerraClimate (1/24-degree), aggregated to county level.
• Remote sensing variables:
o Vegetation Indices (VIs): NDVI, EVI, and NIRv from MODIS (MOD13A2/MYD13A2), pre-processed and aggregated to county-level maximums/averages.
o Solar-Induced Chlorophyll Fluorescence (SIF): High-resolution GOSIF and CSIF, aggregated to county level, providing a direct proxy for photosynthetic activity.
• County-level planting area: From the National Earth System Science Data Center.
All data layers were matched by county and year. Missing values were handled via imputation; quality control removed outliers.
Notable Findings & Interpretation:
Our modeling (using LASSO, RIDGE, SVR, RF, XGBoost, TabPFN) yielded key insights:
1. Synergistic Power: Integrating climate and remote sensing data delivered the most robust predictions ($R^2=0.72-0.81$), outperforming climate-only ($R^2=0.60-0.78$) or remote sensing-only ($R^2=0.43-0.65$) models. This highlights capturing both environmental drivers and biological manifestations.
2. SIF's Advantage: SIF generally outperformed VIs ($R^2_{max}=0.65$ vs. $R^2_{max}=0.62$) due to its direct link to photosynthesis. NIRv performed comparably to CSIF in remote sensing-only scenarios.
3. Dynamic Data Contributions: Data roles evolve seasonally. Climate data was crucial early on; remote sensing became more informative as the season progressed, integrating cumulative weather effects.
4. Non-linear Model Superiority: Non-linear ML methods consistently outperformed linear models, with TabPFN achieving the best performance, underscoring the inherently complex crop-yield relationships.
5. Robustness in Anomalous Years: SIF-based models (especially GOSIF) showed superior robustness in challenging years (e.g., 2016), maintaining better performance when VIs struggled.
How to Interpret & Use This Data:
This dataset is a valuable resource for agricultural science, remote sensing, and environmental modeling. It can be used to:
• Validate/benchmark new yield prediction models against comprehensive real-world data.
• Investigate spatio-temporal yield patterns and underlying environmental drivers.
• Explore relationships between climate, remote sensing, and yield.
• Develop/refine agricultural management strategies by understanding yield influencing factors.
• Study extreme weather event impacts on winter wheat productivity.
中国东部县域尺度冬小麦产量预测数据集说明(2005-2022年)
本数据集以单个CSV文件(winter_wheat_yield_prediction_data.csv)形式提供,汇集了2005-2022年间中国东部77个县域的冬小麦产量建模所需的全面输入特征。每一行代表一个独特的县域-年度观测样本。数据采集过程严谨细致:
• 县域尺度冬小麦产量(吨/公顷):源自官方统计年鉴。
• 气候变量:基于TerraClimate(1/24度分辨率)的生育期(10月-次年5月)温度、降水和太阳辐射的平均值/总量,聚合至县域尺度。
• 遥感变量:
o 植被指数(Vegetation Indices, VIs):来自MODIS(MOD13A2/MYD13A2)的归一化植被指数(Normalized Difference Vegetation Index, NDVI)、增强型植被指数(Enhanced Vegetation Index, EVI)以及近红外短波植被指数(Near-Infrared Reflectance of Vegetation, NIRv),经预处理后聚合为县域尺度的最大值/平均值。
o 日光诱导叶绿素荧光(Solar-Induced Chlorophyll Fluorescence, SIF):高分辨率的GOSIF与CSIF数据,聚合至县域尺度,可直接作为光合活动的替代指标。
• 县域尺度种植面积:源自国家地球系统科学数据中心。
所有数据图层均按县域和年度进行匹配。缺失值通过插补法处理,异常值经质量控制予以剔除。
重要发现与解读:
本研究采用LASSO、RIDGE、支持向量回归(Support Vector Regression, SVR)、随机森林(Random Forest, RF)、极端梯度提升(eXtreme Gradient Boosting, XGBoost)以及TabPFN等模型开展建模,获得了关键认知:
1. 协同效应优势:融合气候与遥感数据的模型预测性能最优(决定系数$R^2=0.72-0.81$),优于仅使用气候数据($R^2=0.60-0.78$)或仅使用遥感数据($R^2=0.43-0.65$)的模型。这一结果凸显了同时捕捉环境驱动因子与生物学表现的重要性。
2. SIF的性能优势:日光诱导叶绿素荧光(SIF)整体表现优于植被指数(VIs)(最高决定系数$R^2_{max}=0.65$相较于$R^2_{max}=0.62$),原因在于其与光合过程的直接关联。在仅使用遥感数据的场景中,NIRv的表现与CSIF相当。
3. 数据贡献的季节动态:不同数据的贡献随生育期动态变化。生育前期气候数据至关重要,而随着生育期推进,遥感数据的信息价值逐渐提升,可整合累积的气象影响。
4. 非线性模型的优越性:非线性机器学习方法持续优于线性模型,其中TabPFN取得了最佳性能,这凸显了作物-产量关系本质上的复杂性。
5. 异常年份的模型鲁棒性(Robustness):基于SIF(尤其是GOSIF)的模型在极端年份(如2016年)表现出更优的鲁棒性,在植被指数(VIs)表现不佳的场景下仍能维持较好的预测性能。
数据解读与使用方法:
本数据集可为农业科学、遥感与环境建模领域提供宝贵的研究资源,可用于以下场景:
• 基于全面的真实世界数据,验证或基准测试新型产量预测模型。
• 探究产量的时空分布格局及其背后的环境驱动因子。
• 剖析气候、遥感数据与产量之间的关联机制。
• 通过明确影响产量的核心因素,制定或优化农业管理策略。
• 研究极端天气事件对冬小麦生产能力的影响。
创建时间:
2025-06-26



