Spatial distribution data set of tea plantations with 10 m resolution from 2000 to 2020 in Fujian Province

Mendeley Data2023-06-14 更新2024-06-27 收录

下载链接：

https://www.doi.org/10.57760/sciencedb.j00001.00825

下载链接

链接失效反馈

官方服务：

资源简介：

\tThis dataset is a 10m resolution spatial distribution dataset of tea gardens in Fujian Province from 2000 to 2020, with a spatial resolution of 10m and a projection coordinate system of WGS_ 1984_ UTM_ Zone_ 50N. Set a time node every five years, including tea garden data for five years: 2000, 2005, 2010, 2015, and 2020. Store all results in a folder named "FJ_tea_10m". The file format in the package is shp's surface data, named "tea_date. shp", such as the 2020 tea garden spatial distribution data, named "tea_2020. shp".\tThe processing process of this dataset mainly includes five parts: data preprocessing, feature extraction, feature optimization, tea plantation classification, and interference data removal method to obtain the temporal dataset:\t1. Data preprocessing\tS1 data is preprocessed using the Sentinel-1 toolbox, which includes calibrating orbit parameters, removing boundary and thermal noise, radiometric calibration, etc., and then synthesizing the image. The S2 data is mainly processed for cloud removal, resulting in 112 cloudless S2 images and synthesized. Use correlation functions in GEE to convert terrain data with a spatial resolution of 30m to a resolution of 10m, and then crop it according to the administrative boundaries of Fujian Province. Screen Landsat long time series images from July to October 2000 to 2020 on the GEE cloud platform, and use QA bands for cloud masking, replacing cloud cover data with neighboring months.\t2. Feature extraction\tThis dataset analyzed different data sources and constructed three different feature variables, including 26 spectral features, 68 texture features, and 4 terrain features.\t3. Feature Selection\tIn order to obtain more accurate tea garden extraction results, four experimental schemes were designed in this dataset: spectral features, spectral features+texture features, spectral features+texture features+terrain features, and SVM_ RFE feature selection, where Scheme 4 is a combination of feature selection using SVM_ RFE Feature selection algorithm selects the most important feature variable for tea garden extraction, which avoids the problem of low extraction accuracy and efficiency caused by feature redundancy.\t4. Classification of tea gardens\tSupport vector machine classifier is used to classify tea garden data, and then Confusion matrix is used to evaluate the accuracy of the four classification schemes. The main reference values are producer accuracy, user accuracy, overall accuracy, and Kappa coefficient. The accuracy verification results show that Scheme 4, after feature optimization, has the highest extraction accuracy. Finally, a 10m resolution thematic spatial distribution map of tea gardens in Fujian Province was obtained in 2020.\t5. Obtaining Time Series Datasets Using Interference Data Exclusion Method\tThrough field investigations, it was found that the area of tea gardens in Fujian Province has been continuously increasing in the past 20 years, and the distribution of tea gardens has been continuously expanding. Therefore, this dataset adopts the interference data removal method, using the obtained vegetation interference information to mask the distribution results of tea gardens in 2020 and earlier, and sequentially obtain the spatial distribution of tea gardens in 2000, 2005, 2010, and 2015.\tThe specific steps for implementing the interference data removal method are: based on long-term Landsat series satellite data, the LandTrender algorithm is used on the GEE cloud platform to detect changes in Landsat temporal images, obtain the time nodes for vegetation disturbance and restoration, and divide the vegetation in Fujian Province into interference and non interference areas. Set up an interference node every 5 years to merge vegetation interference information, and obtain vegetation interference information from 2000 to 2004, 2005 to 2009, 2010 to 2014, and 2015 to 2019, respectively. Taking the steps to obtain spatial distribution data of tea gardens in 2015 as an example: based on prior knowledge of tea garden expansion, overlay analysis was conducted using vegetation interference information from 2015 to 2019 and 2020 tea garden special topics, removing patterns within the scope of tea garden special topics that overlap with interference information, and obtaining spatial distribution data of tea gardens in 2015. After performing overlay analysis and elimination operations on earlier tea garden data in the above manner, a temporal dataset of tea garden spatial distribution for the years 2000, 2005, and 2010 was obtained.\tAfter on-site investigation and verification, it was found that although tea gardens have shown a gradual expansion trend, the annual changes are very small, with only a few counties and cities experiencing relatively more changes in tea gardens. And this dataset uses tea garden data with a resolution of 10m in 2020 as the mask object. After being masked and imported into Google Earth for verification, it was found that tea garden data before 2020 is basically close to a resolution of 10m. Therefore, it can be considered that the resolution of tea garden data obtained through interference data removal method from 2000 to 2015 is 10m.

本数据集为2000至2020年福建省茶园空间分布数据集，空间分辨率为10米，投影坐标系采用WGS_1984_UTM_Zone_50N。每5年设置一个时间节点，包含2000、2005、2010、2015、2020五个年份的茶园数据。所有结果存储于名为"FJ_tea_10m"的文件夹中，数据包内的文件格式为shp面状数据，命名规则为"tea_年份.shp"，例如2020年茶园空间分布数据命名为"tea_2020.shp"。本数据集的处理流程主要包含五个部分：数据预处理、特征提取、特征优化、茶园分类以及通过干扰数据剔除方法获取时序数据集： 1. 数据预处理使用Sentinel-1工具箱对S1数据进行预处理，包括轨道参数校正、边界与热噪声去除、辐射定标等操作，随后进行影像合成。针对S2数据主要开展去云处理，得到112幅无云S2影像并完成合成。借助谷歌地球引擎（Google Earth Engine，GEE）中的相关工具，将空间分辨率为30米的地形数据重采样至10米，随后按照福建省行政边界进行裁切。在GEE云平台上筛选2000至2020年7-10月的Landsat长时序影像，利用QA波段进行云掩膜，并使用邻近月份的数据替换云覆盖区域。 2. 特征提取本数据集针对多源数据进行分析，构建了三类不同的特征变量，包括26个光谱特征、68个纹理特征以及4个地形特征。 3. 特征选择为获取更精准的茶园提取结果，本数据集设计了四组实验方案：仅光谱特征、光谱特征+纹理特征、光谱特征+纹理特征+地形特征，以及基于SVM_RFE的特征选择方案。其中方案四通过SVM_RFE特征选择算法筛选出用于茶园提取的最重要特征变量，避免了特征冗余导致的提取精度与效率低下问题。 4. 茶园分类采用支持向量机分类器对茶园数据进行分类，随后通过混淆矩阵对四组分类方案的精度进行评估，主要参考指标为生产者精度、用户精度、总体精度以及Kappa系数。精度验证结果表明，经过特征优化后的方案四提取精度最高。最终得到2020年福建省10米分辨率茶园专题空间分布图。 5. 基于干扰数据剔除方法获取时序数据集通过野外调查发现，近20年来福建省茶园面积持续增长，茶园分布范围不断扩张。因此本数据集采用干扰数据剔除方法，利用获取的植被干扰信息对2020年及更早的茶园分布结果进行掩膜处理，依次得到2000、2005、2010、2015年的茶园空间分布数据。干扰数据剔除方法的具体实施步骤为：基于长时序Landsat系列卫星数据，在GEE云平台上使用LandTrendr算法检测Landsat时序影像的变化，获取植被干扰与恢复的时间节点，将福建省内植被划分为干扰区与非干扰区。每5年设置一个干扰节点以合并植被干扰信息，分别得到2000-2004年、2005-2009年、2010-2014年以及2015-2019年的植被干扰信息。以2015年茶园空间分布数据的获取步骤为例：基于茶园扩张的先验知识，利用2015-2019年的植被干扰信息与2020年茶园专题数据进行叠加分析，移除茶园专题范围内与干扰信息重叠的图斑，由此得到2015年茶园空间分布数据。按照上述方式对更早年份的茶园数据进行叠加分析与剔除操作后，即可得到2000、2005、2010年的茶园空间分布时序数据集。经实地调查验证发现，尽管茶园整体呈逐步扩张趋势，但年度变化幅度极小，仅少数县市的茶园变化量相对较大。本数据集以2020年10米分辨率的茶园数据作为掩膜基准对象，经掩膜处理后导入Google Earth（谷歌地球）进行验证，结果显示2020年之前的茶园数据分辨率基本接近10米。因此可以认为，通过干扰数据剔除方法获取的2000至2015年茶园数据分辨率均为10米。

创建时间：

2023-06-14