five

Satellite-derived crop field boundaries in heterogeneous smallholder-dominated regions in the North of Mozambique

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11488975
下载链接
链接失效反馈
官方服务:
资源简介:
Satellite-based field delineation has rapidly evolved due to recent advances in machine learning for computer vision. However, the scarcity of labeled data for complex and dynamic smallholder landscapes remains a major bottleneck for operational field delineation and downstream applications, particularly in Sub-Saharan Africa. We here provide reference field boundaries collected in the scope of a research project funded by the F.R.S.-FNRS. A detailed description of the data can be found in our corresponding pre-print. The dataset contains multiple files: sites.gpkg: Vector dataset containing the sampling sites covered by the dataset. We covered 513 sites of 600x600m, or 36 ha each.  human_fields_train.gpkg: 1,518 field delineations distributed across 200 sites. Individual fields were manually digitized based on very high resolution satellite imagery in Google Earth Pro. Each polygon contains the corresponding image acquisition date. These fields were used for model training in the paper.  human_fields_test.gpkg: 2,199 field delineations distributed across 313 sites. Individual fields were manually digitized based on very high resolution satellite imagery in Google Earth Pro. Each polygon contains the corresponding image acquisition date. These fields were used for evaluation of all experiments described in the paper.  pseudo_fields_train.gpkg: 766 pseudo labels obtained from predictions using a pre-trained FracTAL ResUNet model. The pseudo-labels correspond to the selection using P99(SemCN). For other sets of pseudo-labels please contact us. Brief description of methods For site selection, we developed a stratified random sampling scheme to sample from regions with actively used cropland. To identify these regions, we used an existing map of active and fallow cropland for the growing season of September 2020 through August 2021 (Rufin et al., 2022). We aggregated the map to a 1 ha grid and calculated the proportions of active cropland. We sampled 1,000 sites from regions mapped as containing at least 50% of active cropland within a one-hectare grid cell. We defined a site extent of 600 by 600 meters, or 36 ha, in order to assure that a sufficient number of fields can be delineated, even in regions with comparatively large field size. The selected sites were screened for VHR image quality and acceptable visibility of at least five fields, resulting in 513 sample sites.  For human labels, we tasked human annotators to collect sparse labels (i.e. at least five fields) per site. We tasked the interpreters to collect only fields containing non-tree crops by systematically excluding tree crop plantations from our data. While individual trees in the field interior were included in our labels, trees overlapping with the field boundaries were avoided and the tree canopy was considered as the field boundary for completing the labels. All field delineations underwent an iterative quality assessment, where 18% of the initial field delineations and 7% of the field delineations in a second iteration were discarded.  The pseudo labels provided here were selected using the 1% most confident predictions from the pre-trained model. The confidence scores were computed as the median of all pixel-level field extent probabilities for each field instance.  For more details please read the corresponding paper: Rufin, Wang, Lisboa, Hemmerling, Tulbure & Meyfroidt, P. (2023). Taking it further: Leveraging pseudo labels for field delineation across label-scarce smallholder regions. https://doi.org/10.48550/ARXIV.2312.08384
创建时间:
2024-06-07
二维码
社区交流群
二维码
科研交流群
商业服务