five

Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset

收藏
DataONE2025-11-18 更新2025-12-06 收录
下载链接:
https://search.dataone.org/view/sha256:744d9861d3580936e0f240888c3136d7adc37073f3b08e8f33e85b17abf2d0a6
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is part of the Sample Earth initiative, a global effort to build open, high-quality reference data for improving the accuracy and inclusiveness of land-cover maps. It contains GPS-located land-cover samples that can be used to train and validate AI models that generate detailed, accurate maps, with a focus on coffee and cocoa production systems. The data were collected across Vietnam and Ghana, combining expert interpretation of high-resolution satellite imagery (Google Earth, Planet) with a smaller subset of ground-truth observations. Each point is labeled and quality-controlled to represent a diverse range of land-cover types commonly found within and around smallholder production areas. The classification scheme includes 10 main classes (such as coffee, cocoa, orchard, natural forests) and 68 sub-classes (such as full sun coffee, coffee intercropped with black pepper, While the primary goal is to distinguish coffee and cocoa systems from other land uses, the dataset also supports broader applications such as agricultural monitoring, deforestation analysis, ecosystem service mapping, land-use planning, and suitability modeling. By providing transparent, well-validated training data, this dataset contributes to Sample Earth’s broader objective: strengthening AI-based land monitoring tools and supporting global efforts, including the EU Deforestation Regulation (EUDR), to ensure sustainable, deforestation-free agricultural supply chains. The dataset is designed to grow continuously, incorporating new commodities, timeframes, and countries over time. The data is released under a Creative Commons Attribution–NonCommercial license. Entities wishing to use the data for commercial purposes are encouraged to contact us to establish a tailored data-sharing agreement. Methodology:The dataset was developed primarily through expert visual interpretation of high-resolution satellite imagery from Google Earth and Planet, collected between 2019 and 2022. A smaller subset of points in the Central Highlands of Vietnam was derived from field observations, providing additional ground-truth validation. To enhance interpreter accuracy and contextual understanding, field visits and Google Street View assessments were conducted in both Vietnam and Ghana. These activities helped experts better recognize local land-use patterns and distinguish among different crop and landscape types. All sample points were digitized and standardized using QGIS, with attributes including class ID, crop type, sampling date, and associated metadata to ensure consistency and interoperability. This combined approach of expert interpretation, localized training, and structured data management ensured a high-quality, consistent, and machine-learning–ready dataset suitable for land-cover mapping and model training workflows.

本数据集隶属于样本地球(Sample Earth)计划——一项旨在构建开放、高质量参考数据,以提升土地覆盖地图精度与包容性的全球行动。该数据集包含带有全球定位系统(GPS)定位的土地覆盖样本,可用于训练与验证可生成精细、精准地图的人工智能(AI)模型,研究重点聚焦咖啡与可可种植体系。 数据采集覆盖越南与加纳,结合了对高分辨率卫星影像(谷歌地球(Google Earth)、Planet)的专家解译,以及少量地面实测样本。每个样本点均经过标注与质量控制,可代表小农生产区域及其周边常见的各类土地覆盖类型。 其分类体系包含10个一级类别(如咖啡种植地、可可种植地、果园、天然林等)与68个二级类别(如全日照咖啡种植园、与黑胡椒间作的咖啡种植园等)。尽管核心目标是区分咖啡与可可种植体系与其他土地利用方式,但本数据集也可支持更广泛的应用场景,包括农业监测、森林砍伐分析、生态系统服务制图、土地利用规划以及适宜性建模。 通过提供透明且经过充分验证的训练数据,本数据集助力样本地球(Sample Earth)的整体目标:强化基于人工智能的土地监测工具,并支持包括《欧盟森林砍伐法规(EU Deforestation Regulation, EUDR)》在内的全球行动,以实现可持续且无森林砍伐的农业供应链。 本数据集设计为持续迭代更新,未来将纳入更多作物品类、时间跨度与覆盖国家。本数据集采用知识共享署名-非商业性使用(Creative Commons Attribution–NonCommercial)许可协议进行发布。对于希望将本数据集用于商业用途的主体,我们鼓励其联系我方以定制专属数据共享协议。 方法学:本数据集主要通过对2019年至2022年间采集的谷歌地球(Google Earth)与Planet高分辨率卫星影像进行专家目视解译构建而成。越南中部高地的少量样本点则源自实地观测,用于提供额外的地面实测验证。为提升解译人员的解译精度与场景认知能力,研究团队在越南与加纳均开展了实地考察与谷歌街景(Google Street View)评估工作。这些工作帮助专家更好地识别当地土地利用模式,并区分不同作物与景观类型。所有样本点均通过QGIS进行数字化与标准化处理,其属性包含类别ID、作物类型、采样日期与相关元数据,以确保数据的一致性与互操作性。这种结合专家解译、本地化培训与结构化数据管理的方法,确保了本数据集具备高质量、一致性且可直接用于机器学习的特性,适配土地覆盖制图与模型训练工作流。
创建时间:
2025-11-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作