five

NEON Tree Species Predictions

收藏
Mendeley Data2024-06-29 更新2024-06-27 收录
下载链接:
https://zenodo.org10581546
下载链接
链接失效反馈
官方服务:
资源简介:
# Individual Tree Predictions for 100 million trees in the National Ecological Observatory Network Preprint: https://www.biorxiv.org/content/10.1101/2023.10.25.563626v1 ## Manuscript Abstract The ecology of forest ecosystems depends on the composition of trees. Capturing fine-grained information on individual trees at broad scales allows an unprecedented view of forest ecosystems, forest restoration and responses to disturbance. To create detailed maps of tree species, airborne remote sensing can cover areas containing millions of trees at high spatial resolution. Individual tree data at wide extents promises to increase the scale of forest analysis, biogeographic research, and ecosystem monitoring without losing details on individual species composition and abundance. Computer vision using deep neural networks can convert raw sensor data into predictions of individual tree species using ground truthed data collected by field researchers. Using over 40,000 individual tree stems as training data, we create landscape-level species predictions for over 100 million individual trees for 24 sites in the National Ecological Observatory Network. Using hierarchical multi-temporal models fine-tuned for each geographic area, we produce open-source data available as 1km^2 shapefiles with individual tree species prediction, as well as crown location, crown area and height of 81 canopy tree species. Site-specific models had an average performance of 79% accuracy covering an average of six species per site, ranging from 3 to 15 species. All predictions were uploaded to Google Earth Engine to benefit the ecology community and overlay with other remote sensing assets. These data can be used to study forest macro-ecology, functional ecology, and responses to anthropogenic change. ## Data Summary Each NEON site is a single zip archive with tree predictions for all available data. For site abbreviations see: https://www.neonscience.org/field-sites/explore-field-sites. For each site, there is a .zip and .csv. The .zip is a set 1km .shp tiles. The .csv is all trees in a single file. ## Prediction metadata *Geometry* A four pointed bounding box location in utm coordinates. *indiv_id* A unique crown identifier that combines the year, site and geoindex of the NEON airborne tile (e.g. 732000_4707000) is the utm coordinate of the top left of the tile. *sci_name* The full latin name of predicted species aligned with NEON's taxonomic nomenclature. *ens_score* The confidence score of the species prediction. This score is the output of the multi-temporal model for the ensemble hierarchical model. *bleaf_taxa* Highest predicted category for the broadleaf submodel *bleaf_score* The confidence score for the broadleaf taxa submodel *oak_taxa* Highest predicted category for the oak model *dead_label* A two class alive/dead classification based on the RGB data. 0=Alive/1=Dead. *dead_score* The confidence score of the Alive/Dead prediction. *site_id* The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations. *conif_taxa* Highest predicted category for the conifer model *conif_score* The confidence score for the conifer taxa submodel *dom_taxa* Highest predicted category for the dominant taxa mode submodel *dom_score* The confidence score for the dominant taxa submodel ## Training data The crops.zip contains pre-cropped files. 369 band hyperspectral files are numpy arrays. RGB crops are .tif files. Naming format is <individualID>_<year>_<sensor>, for example. "NEON.PLA.D07.GRSM.00583_2022_RGB.tif" is RGB crop of the predicted crown of NEON data from Great Smoky Mountain National Park (GRSM), flown in 2022. Along with the crops are .csv files for various train-test split experiments for the manuscript. ### Crop metadata There are 30,042 individuals in the annotations.csv file. We keep all data, but we recommend a filtering step of atleast 20 records per species to reduce chance of taxonomic or data cleaning errors. This leaves 132 species. *score* This was the DeepForest crown score for the crop. *taxonID* For letter species code, see NEON plant taxonomy for scientific name: https://data.neonscience.org/taxonomic-lists *individual* unique individual identifier for a given field record and crown crop *siteID* The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations. *plotID* NEON plot ID within the site. For more information on NEON sampling see: https://www.neonscience.org/data-samples/data-collection/observational-sampling/site-level-sampling-design *CHM_height* The LiDAR derived height for the field sampling point. *image_path* Relative pathname for the hyperspectral array, can be read by numpy.load -> format of 369 bands * Height * Weight *tile_year* Flight year of the sensor data *RGB_image_path* Relative pathname for the RGB array, can be read by rasterio.open() # Code repository The predictions were made using the DeepTreeAttention repo: https://github.com/weecology/DeepTreeAttention Key files include model definition for a [single year model](https://github.com/weecology/DeepTreeAttention/blob/main/src/models/Hang2020.py) and [Data preprocessing](https://github.com/weecology/DeepTreeAttention/blob/cae13f1e4271b5386e2379068f8239de3033ec40/src/utils.py#L59).

# 美国国家生态观测站网络(National Ecological Observatory Network, NEON)1亿棵单棵树木的个体预测结果 预印本:https://www.biorxiv.org/content/10.1101/2023.10.25.563626v1 ## 论文摘要 森林生态系统的生态学特征取决于树木的群落组成。在大尺度下获取单棵树木的精细信息,能够让我们以前所未有的视角认识森林生态系统、森林恢复过程以及其对干扰的响应。机载遥感技术可在高空间分辨率下覆盖包含数百万棵树木的区域,从而绘制出详细的树木物种分布图。大范围的单棵树木数据有望在不丢失单个物种组成与丰度细节的前提下,拓展森林分析、生物地理学研究以及生态系统监测的尺度。借助基于实地研究人员采集的地面实测数据训练的深度神经网络计算机视觉模型,可将原始传感器数据转换为单棵树木的物种预测结果。本研究以超过40000棵单株树干作为训练数据,为美国国家生态观测站网络的24个监测点的超过1亿棵单棵树木生成了景观尺度的物种预测结果。我们针对每个地理区域微调了分层多时相模型,最终生成了开源数据,以1平方千米矢量形状文件(shapefile)的形式提供,包含单棵树木的物种预测结果、树冠位置、树冠面积以及81种冠层树木的树高信息。针对每个监测点构建的专属模型平均准确率达79%,每个监测点平均覆盖6个物种,物种数量范围为3至15种。所有预测结果已上传至谷歌地球引擎(Google Earth Engine),以服务于生态学研究社群,并可与其他遥感数据集叠加分析。这些数据可用于研究森林宏观生态学、功能生态学以及人类活动驱动的环境变化响应。 ## 数据概述 每个NEON监测点对应一个单独的压缩归档文件,包含该站点所有可用数据的树木预测结果。监测点缩写对照表可参见:https://www.neonscience.org/field-sites/explore-field-sites。每个站点对应一个.zip压缩包和一个.csv文件:.zip压缩包内含若干1平方千米的.shp格式瓦片;.csv文件则将所有树木数据整合为单个文件。 ## 预测元数据 *Geometry*:采用UTM坐标系的四角边界框位置。 *indiv_id*:唯一的树冠标识符,结合了NEON机载遥感瓦片的年份、站点和地理索引(例如:732000_4707000为该瓦片左上角的UTM坐标)。 *sci_name*:预测物种的完整拉丁学名,与NEON的分类命名系统保持一致。 *ens_score*:物种预测的置信度得分,该得分为分层多时相集成模型的输出结果。 *bleaf_taxa*:阔叶树子模型的最高预测分类结果。 *bleaf_score*:阔叶树类群子模型的置信度得分。 *oak_taxa*:栎树模型的最高预测分类结果。 *dead_label*:基于RGB数据生成的存活/死亡二分类结果,0=存活,1=死亡。 *dead_score*:存活/死亡预测的置信度得分。 *site_id*:NEON监测点的四字母代码,监测点位置详情可参见:https://www.neonscience.org/field-sites/explore-field-sites。 *conif_taxa*:针叶树模型的最高预测分类结果。 *conif_score*:针叶树类群子模型的置信度得分。 *dom_taxa*:优势类群模型的最高预测分类结果。 *dom_score*:优势类群子模型的置信度得分。 ## 训练数据 crops.zip压缩包内含预裁剪的文件:369个波段的高光谱文件为NumPy数组(numpy array)格式;RGB裁剪图像为.tif格式。文件命名格式为<individualID>_<year>_<sensor>,例如:"NEON.PLA.D07.GRSM.00583_2022_RGB.tif"为2022年飞行获取的大烟山国家公园(GRSM)NEON数据中预测树冠的RGB裁剪图像。随裁剪文件一同提供的还有用于论文中各类训练-测试分割实验的.csv文件。 ### 裁剪文件元数据 annotations.csv文件中包含30042个个体的标注数据。本数据集保留全部原始数据,但建议进行过滤步骤:每个物种至少保留20条记录,以降低分类学或数据清洗错误的概率,过滤后剩余132个物种。 *score*:该裁剪图像的DeepForest(DeepForest)树冠得分。 *taxonID*:物种字母代码,对应的科学学名可参见NEON植物分类列表:https://data.neonscience.org/taxonomic-lists *individual*:给定实地记录和树冠裁剪文件的唯一个体标识符。 *siteID*:NEON监测点的四字母代码,监测点位置详情可参见:https://www.neonscience.org/field-sites/explore-field-sites。 *plotID*:站点内的NEON样地ID。如需了解NEON采样方法的更多信息,可参见:https://www.neonscience.org/data-samples/data-collection/observational-sampling/site-level-sampling-design *CHM_height*:基于激光雷达(LiDAR)获取的实地采样点树高数据。 *image_path*:高光谱数组的相对路径,可通过numpy.load()读取,格式为369个波段 × 高度 × 宽度。 *tile_year*:传感器数据的飞行年份。 *RGB_image_path*:RGB数组的相对路径,可通过rasterio.open()读取。 ## 代码仓库 本预测结果基于DeepTreeAttention代码库生成:https://github.com/weecology/DeepTreeAttention。关键文件包括[单年份模型定义](https://github.com/weecology/DeepTreeAttention/blob/main/src/models/Hang2020.py)和[数据预处理代码](https://github.com/weecology/DeepTreeAttention/blob/cae13f1e4271b5386e2379068f8239de3033ec40/src/utils.py#L59)。
创建时间:
2024-02-10
搜集汇总
背景与挑战
背景概述
NEON Tree Species Predictions数据集提供了美国国家生态观测站网络中1亿棵树木的详细预测信息,包括81种冠层树木的物种、位置和形态特征。这些数据是通过深度学习模型分析航拍遥感数据获得的,平均预测准确率为79%,可用于森林生态学、功能生态学等研究领域。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作