jonghanko/Cornbelt_dataset_n_Deep_Learning_Framework
收藏Hugging Face2026-04-24 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/jonghanko/Cornbelt_dataset_n_Deep_Learning_Framework
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
language:
- en
license:
- cc-by-4.0
task_categories:
- tabular-regression
- image-to-image
tags:
- agriculture
- remote-sensing
- crop-modeling
- deep-learning
- maize
- yield-prediction
- US-Corn-Belt
- agroecosystem
- MODIS
- AgERA5
- CORDEX
pretty_name: US Corn Belt Maize Yield Dataset and Deep Learning Framework (2012–2020)
size_categories:
- n>1T
---
# US Corn Belt Maize Yield Dataset and Deep Learning Framework (2012–2020)
This repository contains the dataset and deep-learning scripts associated with the study:
> Jeong, S., Ko, J., Shin, T., Ban, J.-O., Wie, J., Yeom, J.-M. **Integrating deep learning and satellite imagery for spatiotemporal maize yield prediction in the US Corn Belt.** *International Journal of Applied Earth Observation and Geoinformation* (submitted).
## Overview
We pair optimization-based assimilation of MODIS-derived leaf area index (LAI) into the process-based **Remote Sensing-integrated Crop Model (RSCM)** with five deep-learning regressors — Feed-Forward Neural Network (FFNN), Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), and Transformer — to predict state-level maize yield at 500-m spatial resolution across seven US Corn Belt states (Iowa, Illinois, Indiana, Minnesota, Nebraska, Ohio, South Dakota) for 2012–2020. The GRU produced the most stable out-of-sample performance (mean NSE = 0.92 on the 2020 holdout).
The repository provides:
1. **Core dataset (~271 GB, zipped):** processed MODIS reflectance and land-surface temperature, AgERA5 meteorology, and CORDEX-North America climate projections at 500-m resolution across the seven-state domain.
2. **Analysis scripts (unzipped):** training, inference, and visualization code for the three modeling workflows described in the paper.
## Repository structure
- `Cornbelt_dataset.zip` — Processed inputs (folder name inside archive: `MODIS_CornBelt_2012_to_2020`):
- MODIS MOD09A1 Collection 6.1 (8-day surface reflectance, 500 m)
- MODIS MOD11A1 Collection 6.1 (daily land-surface temperature, resampled from 1 km to 500 m)
- AgERA5 daily meteorology (downward surface solar radiation, 2-m maximum and minimum air temperature), resampled from 0.1° (~9 km) to 500 m
- CORDEX-North America projections from GERICS-REMO2015 forced by MPI-ESM-LR and NorESM1-M, under RCP 2.6 and RCP 8.5, for the baseline (2006–2025), 2050s (2040–2060), and 2090s (2080–2100)
- `scripts_Climate_n_LAI/` — LAI estimation from climate drivers (daily max/min temperature and solar radiation)
- `Scripts_RSCM_sim_growth_n_climate_to_Yield/` — yield prediction and spatial mapping using RSCM-simulated growth variables combined with climate inputs (hybrid RSCM-ML configuration; used for yield validation in the paper)
- `Scripts_Climate_n_LAI_to_Yield/` — yield prediction and spatial mapping from climate inputs only (configuration used for future CORDEX-driven projections)
## Dataset details
- **Spatial coverage:** seven US Corn Belt states — Iowa, Illinois, Indiana, Minnesota, Nebraska, Ohio, South Dakota (~1.5 million km², ~1.1 million 500-m cropland pixels)
- **Spatial resolution:** 500 m, Albers Equal Area Conic projection
- **Temporal range:** 2012–2020 (historical); 2006–2025, 2040–2060, and 2080–2100 (CORDEX projections)
- **Key variables:** Leaf Area Index (LAI), above-ground biomass, maize yield, NDVI, MTVI1, OSAVI, RDVI, solar radiation, maximum and minimum air temperature, land-surface temperature
- **Reference yields:** USDA National Agricultural Statistics Service (USDA-NASS) state-level maize yield, 2012–2020 (63 state-year combinations)
## Intended use
This dataset is suitable for research on regional-scale crop-yield prediction, remote-sensing–based agroecosystem monitoring, hybrid process-based + machine-learning modeling, and climate-change impact assessment for maize systems. It is intended for research and educational purposes.
## Out-of-scope use
The 500-m "observed yield" maps were produced by disaggregating USDA-NASS state totals in proportion to RSCM-simulated pixel yield and are **not** independent pixel-level observations. Quantitative accuracy statements in the paper are therefore made at the state-year aggregation level. Users should not treat the pixel-level reference maps as ground truth; county-level USDA-NASS survey data (not included here) provide statistically independent validation.
The climate-projection outputs use a climate-only input configuration that differs from the validated hybrid RSCM-ML configuration; projections should be read as indicative scenario analyses rather than calibrated forecasts. CO₂ fertilization and irrigation management are not represented.
## Installation and usage
Download via `huggingface-cli`:
```bash
huggingface-cli download <HF-USERNAME>/<HF-REPO-NAME> --repo-type dataset
```
Unzip the core dataset archive and run the scripts in the relevant workflow directory. See individual `README` files inside each `Scripts_*` folder for environment setup and execution instructions. The scripts were developed against Python 3.8 and PyTorch 1.13.1 and require a CUDA-capable GPU for training (NVIDIA A100 used in the paper).
## License
Dataset (`Cornbelt_dataset.zip`): **CC BY 4.0**.
Scripts (`scripts_*/` directories): **MIT License** (see `LICENSE` file in each script directory).
## Citation
If you use this dataset or the accompanying scripts, please cite:
```bibtex
@article{Jeong2026CornBelt,
author = {Jeong, Seungtaek and Ko, Jonghan and Shin, Taewhan and Ban, Jong-oh and Wie, Jieun and Yeom, Jong-Min},
title = {Integrating deep learning and satellite imagery for spatiotemporal maize yield prediction in the US Corn Belt},
journal = {Nature Food},
year = {2026},
note = {Submitted}
}
```
Please also cite the underlying data providers:
- MODIS MOD09A1 / MOD11A1: NASA LP DAAC
- AgERA5: Boogaard et al. (2020), ECMWF Copernicus Climate Change Service
- CORDEX-North America: Copernicus Climate Change Service (2020), `doi:10.24381/cds.bc91edc3`
- USDA-NASS Quick Stats: `https://quickstats.nass.usda.gov/`
## Contact
Jonghan Ko (corresponding author)
Applied Plant Science, Chonnam National University, Gwangju, South Korea
Email: jonghan.ko@jnu.ac.kr
## Acknowledgements
See the acknowledgements section of the associated manuscript.
提供机构:
jonghanko



