Data mining applied to feature selection methods for aboveground carbon stock modelling

Name: Data mining applied to feature selection methods for aboveground carbon stock modelling
Creator: SciELO journals
Published: 2022-12-06 07:29:29
License: 暂无描述

DataCite Commons2022-12-06 更新2024-07-29 收录

下载链接：

https://scielo.figshare.com/articles/dataset/Data_mining_applied_to_feature_selection_methods_for_aboveground_carbon_stock_modelling/21679161

下载链接

链接失效反馈

官方服务：

资源简介：

Abstract The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used database covered 1,007 plots sampled in the Rio Grande watershed, in the state of Minas Gerais state, Brazil, and 114 environmental variables (climatic, edaphic, geographic, terrain, and spectral). The best feature selection strategy – RF with multiobjective GA – reaches the minor root-square error of 17.75 Mg ha-1 with only four spectral variables – normalized difference moisture index, normalized burnratio 2 correlation text ure, treecover, and latent heat flux –, which represents a reduction of 96.5% in the size of the database. Feature selection strategies assist in obtaining a better RF performance, by improving the accuracy and reducing the volume of the data. Although the recursive removal and multiobjective GA showed a similar performance as feature selection strategies, the latter presents the smallest subset of variables, with the highest accuracy. The findings of this study highlight the importance of using near infrared, short wavelengths, and derived vegetation indices for the remote-sense-based estimation of AGC. The MODIS products show a significant relationship with the AGC stock and should be further explored by the scientific community for the modelling of this stock.

摘要本研究旨在将随机森林（random forest, RF）算法应用于热带森林地上碳储量（aboveground carbon, AGC）的建模，通过测试三种特征选择方法——递归移除法、单目标遗传算法与多目标遗传算法（genetic algorithms, GAs）。本次研究采用的数据库涵盖巴西米纳斯吉拉斯州里奥格兰德流域内的1007个野外样地，以及114项环境变量（涵盖气候、土壤、地理、地形与光谱维度）。表现最优的特征选择策略——结合多目标遗传算法的随机森林——仅使用4项光谱变量，即归一化差异湿度指数、归一化燃烧比2相关纹理、植被覆盖度与潜热通量，便实现了17.75 Mg·ha⁻¹的最小均方根误差，较原始数据库的变量规模缩减了96.5%。特征选择策略可通过提升模型精度、缩减数据体量，优化随机森林的建模性能。尽管递归移除法与多目标遗传算法作为特征选择策略时表现相近，但后者所得变量子集规模最小且精度最高。本研究结果凸显了近红外波段、短波波段及衍生植被指数在基于遥感估算地上碳储量中的重要性。中分辨率成像光谱仪（Moderate Resolution Imaging Spectroradiometer, MODIS）产品与地上碳储量存在显著相关关系，值得科学界进一步探索以开展该碳储量的建模研究。

提供机构：

SciELO journals

创建时间：

2022-12-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集