five

NEON Tree Species Predictions

收藏
Mendeley Data2024-05-17 更新2024-06-29 收录
下载链接:
https://zenodo.org/records/10926344
下载链接
链接失效反馈
官方服务:
资源简介:
# Individual Tree Predictions for 100 million trees in the National Ecological Observatory Network Preprint: https://www.biorxiv.org/content/10.1101/2023.10.25.563626v1 ## Manuscript Abstract The ecology of forest ecosystems depends on the composition of trees. Capturing fine-grained information on individual trees at broad scales allows an unprecedented view of forest ecosystems, forest restoration and responses to disturbance. To create detailed maps of tree species, airborne remote sensing can cover areas containing millions of trees at high spatial resolution. Individual tree data at wide extents promises to increase the scale of forest analysis, biogeographic research, and ecosystem monitoring without losing details on individual species composition and abundance. Computer vision using deep neural networks can convert raw sensor data into predictions of individual tree species using ground truthed data collected by field researchers. Using over 40,000 individual tree stems as training data, we create landscape-level species predictions for over 100 million individual trees for 24 sites in the National Ecological Observatory Network. Using hierarchical multi-temporal models fine-tuned for each geographic area, we produce open-source data available as 1km^2 shapefiles with individual tree species prediction, as well as crown location, crown area and height of 81 canopy tree species. Site-specific models had an average performance of 79% accuracy covering an average of six species per site, ranging from 3 to 15 species. All predictions were uploaded to Google Earth Engine to benefit the ecology community and overlay with other remote sensing assets. These data can be used to study forest macro-ecology, functional ecology, and responses to anthropogenic change. ## Data Summary Each NEON site is a single zip archive with tree predictions for all available data. For site abbreviations see: https://www.neonscience.org/field-sites/explore-field-sites. For each site, there is a .zip and .csv. The .zip is a set 1km .shp tiles. The .csv is all trees in a single file. ## Prediction metadata *Geometry* A four pointed bounding box location in utm coordinates. *indiv_id* A unique crown identifier that combines the year, site and geoindex of the NEON airborne tile (e.g. 732000_4707000) is the utm coordinate of the top left of the tile. *sci_name* The full latin name of predicted species aligned with NEON's taxonomic nomenclature. *ens_score* The confidence score of the species prediction. This score is the output of the multi-temporal model for the ensemble hierarchical model. *bleaf_taxa* Highest predicted category for the broadleaf submodel *bleaf_score* The confidence score for the broadleaf taxa submodel *oak_taxa* Highest predicted category for the oak model *dead_label* A two class alive/dead classification based on the RGB data. 0=Alive/1=Dead. *dead_score* The confidence score of the Alive/Dead prediction. *site_id* The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations. *conif_taxa* Highest predicted category for the conifer model *conif_score* The confidence score for the conifer taxa submodel *dom_taxa* Highest predicted category for the dominant taxa mode submodel *dom_score* The confidence score for the dominant taxa submodel ## Training data The crops.zip contains pre-cropped files. 369 band hyperspectral files are numpy arrays. RGB crops are .tif files. Naming format is <individualID>_<year>_<sensor>, for example. "NEON.PLA.D07.GRSM.00583_2022_RGB.tif" is RGB crop of the predicted crown of NEON data from Great Smoky Mountain National Park (GRSM), flown in 2022. Along with the crops are .csv files for various train-test split experiments for the manuscript. ### Crop metadata There are 30,042 individuals in the annotations.csv file. We keep all data, but we recommend a filtering step of atleast 20 records per species to reduce chance of taxonomic or data cleaning errors. This leaves 132 species. *score* This was the DeepForest crown score for the crop. *taxonID* For letter species code, see NEON plant taxonomy for scientific name: https://data.neonscience.org/taxonomic-lists *individual* unique individual identifier for a given field record and crown crop *siteID* The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations. *plotID* NEON plot ID within the site. For more information on NEON sampling see: https://www.neonscience.org/data-samples/data-collection/observational-sampling/site-level-sampling-design *CHM_height* The LiDAR derived height for the field sampling point. *image_path* Relative pathname for the hyperspectral array, can be read by numpy.load -> format of 369 bands * Height * Weight *tile_year* Flight year of the sensor data *RGB_image_path* Relative pathname for the RGB array, can be read by rasterio.open() # Code repository The predictions were made using the DeepTreeAttention repo: https://github.com/weecology/DeepTreeAttention Key files include model definition for a [single year model](https://github.com/weecology/DeepTreeAttention/blob/main/src/models/Hang2020.py) and [Data preprocessing](https://github.com/weecology/DeepTreeAttention/blob/cae13f1e4271b5386e2379068f8239de3033ec40/src/utils.py#L59).

# 美国国家生态观测站网络(National Ecological Observatory Network, NEON)1亿棵树木的单木预测 预印本链接:https://www.biorxiv.org/content/10.1101/2023.10.25.563626v1 ## 论文摘要 森林生态系统的生态学特征取决于树木的群落组成。在大空间尺度上获取单木的精细尺度信息,能够让我们以前所未有的视角认知森林生态系统、森林恢复过程及其对干扰的响应。机载遥感技术可在高空间分辨率下覆盖包含数百万棵树木的区域。大范围的单木数据有望在不损失单物种组成与丰度细节的前提下,拓展森林分析、生物地理学研究以及生态系统监测的尺度。借助深度学习神经网络的计算机视觉技术,可利用野外研究人员采集的地面实测数据,将原始传感器数据转化为单木物种预测结果。本研究以超过40000棵单木茎干作为训练数据,针对美国国家生态观测站网络的24个研究站点,完成了超1亿棵单木的景观尺度物种预测。通过针对每个地理区域微调的分层多时间模型,我们生成了开源数据,以1km²的Shapefile格式矢量瓦片发布,包含单木物种预测结果、冠层位置、冠幅面积以及81种冠层树木的树高信息。针对每个站点定制的模型平均准确率达79%,每个站点平均覆盖6个物种,物种数量范围为3至15种。所有预测数据已上传至谷歌地球引擎(Google Earth Engine),以服务于生态学研究群体,并可与其他遥感数据集进行叠加分析。这些数据可用于研究森林宏生态学、功能生态学以及人类活动驱动的环境变化响应。 ## 数据概述 每个NEON站点对应一个单独的压缩归档文件,包含该站点所有可用数据的树木预测结果。站点缩写列表可参见:https://www.neonscience.org/field-sites/explore-field-sites。每个站点包含一个.zip压缩包和一个.csv文件。.zip压缩包内含1km²的Shapefile格式矢量瓦片。.csv文件为单文件存储的所有单木数据。 ## 预测元数据 *Geometry*:采用UTM坐标系的四点边界框位置。 *indiv_id*:唯一冠层标识符,结合了NEON机载遥感瓦片的年份、站点和地理索引(例如,732000_4707000为该瓦片左上角的UTM坐标)。 *sci_name*:预测物种的完整拉丁学名,与NEON的分类命名体系保持一致。 *ens_score*:物种预测的置信度得分,为分层集成多时间模型的输出结果。 *bleaf_taxa*:阔叶类群亚模型的最高预测分类单元 *bleaf_score*:阔叶类群亚模型的置信度得分 *oak_taxa*:栎树模型的最高预测分类单元 *dead_label*:基于RGB数据生成的二分类存活/死亡类别,0=存活,1=死亡。 *dead_score*:存活/死亡预测的置信度得分。 *site_id*:NEON站点的四位字母代码,站点位置信息可参见:https://www.neonscience.org/field-sites/explore-field-sites。 *conif_taxa*:针叶树模型的最高预测分类单元 *conif_score*:针叶类群亚模型的置信度得分 *dom_taxa*:优势类群模式子模型的最高预测分类单元 *dom_score*:优势类群模式子模型的置信度得分 ## 训练数据 `crops.zip`内含预裁剪的文件:369波段高光谱文件为NumPy数组格式;RGB裁剪图像为TIFF(.tif)格式。文件命名格式为`<individualID>_<year>_<sensor>`,例如:“NEON.PLA.D07.GRSM.00583_2022_RGB.tif”为2022年飞行获取的大烟山国家公园(GRSM)NEON数据中预测冠层的RGB裁剪图像。与裁剪图像一同提供的还有用于本研究中各类训练-测试划分实验的.csv文件。 ### 裁剪图像元数据 `annotations.csv`文件中包含30042个单木个体的标注数据。本研究保留了全部数据,但建议至少对每个物种保留20条记录以降低分类学或数据清洗误差的概率,过滤后将剩余132个物种。 *score*:该裁剪图像的DeepForest冠层得分。 *taxonID*:物种字母代码,对应的科学学名可参见NEON植物分类列表:https://data.neonscience.org/taxonomic-lists *individual*:给定野外记录和冠层裁剪图像的唯一个体标识符 *siteID*:NEON站点的四位字母代码,站点位置信息可参见:https://www.neonscience.org/field-sites/explore-field-sites。 *plotID*:站点内的NEON样地ID。更多NEON采样信息可参见:https://www.neonscience.org/data-samples/data-collection/observational-sampling/site-level-sampling-design *CHM_height*:基于激光雷达(LiDAR)数据推导的野外采样点树高。 *image_path*:高光谱数组的相对路径名,可通过`numpy.load()`读取,格式为369波段 × 高度 × 宽度。 *tile_year*:传感器数据的飞行年份 *RGB_image_path*:RGB数组的相对路径名,可通过`rasterio.open()`读取。 ## 代码仓库 本预测基于DeepTreeAttention代码库实现:https://github.com/weecology/DeepTreeAttention 关键文件包括[单年份模型定义](https://github.com/weecology/DeepTreeAttention/blob/main/src/models/Hang2020.py)和[数据预处理脚本](https://github.com/weecology/DeepTreeAttention/blob/cae13f1e4271b5386e2379068f8239de3033ec40/src/utils.py#L59)。
创建时间:
2024-04-08
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集提供了美国国家生态观测网络(NEON)中超过1亿棵个体树的树种预测结果,覆盖24个站点,使用机载遥感数据和深度神经网络模型生成,平均预测准确率达79%。数据以1km^2的shapefile图块形式提供,包含树种、树冠位置、面积和高度等信息,适用于森林宏观生态学、功能生态学及对人为变化的响应研究。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作