Supplement 1. R code and data files used to train and evaluate species distribution models (SDMs).

Mendeley Data2024-06-25 更新2024-06-28 收录

下载链接：

https://wiley.figshare.com/articles/dataset/Supplement_1_R_code_and_data_files_used_to_train_and_evaluate_species_distribution_models_SDMs_/3569064

下载链接

链接失效反馈

官方服务：

资源简介：

File List Ecol_Monograph_supplement_code_biomod2.txt (md5: 1468e75dbf74ed624a8dce871743f924) Ecol_Monograph_supplement_code_dismo_1.txt (md555b20fbe747f7601c53d5b56a93459ea: ) Ecol_Monograph_supplement_code_dismo_2.txt (md5: a33a1745062f1bf816c3d9ec797cdd46) Ecol_Monograph_supplement_code_dismo_3.txt (md5: aff301c5ba52f04eff85e561122964c4) Ecol_Monograph_supplement_code_dismo_4.txt (md5: 244ff730dbd9da02a5439cfd95a439ca) Ecol_Monograph_supplement_code_dismo_5.txt (md5: bec6a05bf1d737b941d0a7a00bde3658) lot_line_section_with_predictors.csv (md5: 48dc1b92e2d3d3b3e4875ef0dc3b87a7) township_bt_post_with_predictors.csv (md5: 86f08554a0a65fec8065f85335aa8ec5) township_line_section_with_predictors.csv (md5: d028af68dcd8f7bca5b28e969cc5c796) biomod2_predictors.zip (md5: 7ab5a1d2ef1847fe64a47483e8220d70) Description This supplement contains the data and code that were used to train and evaluate species distribution models (SDMs). Included are six (6) .txt files that contain code to be run in R, and three (3) .csv files that contain the training data and evaluation data. For all files that contain code, comments are included (“#...”) to describe its functioning. There are two notes regarding the code files in this supplement. First, users seeking to recreate the results should be aware that minor edits to the code are necessary, in order to make sure all pathnames that are referenced in the code will match the locations where the user is storing the data files. Second, the presented code is for training SDMs that include Native American variables (NAVs). A few minor edits to the code would need to be made, in order to run SDMs that exclude NAVs; these edits are documented in the comments of the code files. Both edits are minor and should take little time to make. Also worth noting is the considerable processing time required to train and evaluate the models. While the “biomod2” code is highly-automated, it could still require several hours to a few days to run, on a personal computer. The “dismo” codes could take several days to one week to run properly; these codes also involve much more “manual” inputting of blocks of code into R. Alternatively, more advanced users of R could edit the code to function as a script and/or be more automated. The following is a description of each individual file. Ecol_Monograph_supplement_code_biomod2.txt – this file contains the code for training SDMs from the Holland Land Company (HLC) line-description (or “line section”) data, using three SDM algorithms from the “biomod2” package in R: Generalized Additive Models (GAMs), Generalized Linear Models (GLMs), and Multivariate Adaptive Regression Splines (MARS). Five .txt files contain additional code for training and evaluating boosted regression tree (BRT) models, using the “dismo” package in R. The code for BRT model development was broken down into five files, which must be run in succession. Note that due to the “stochastic” nature of BRT models, slightly different model results may result, in comparison to the results reported in the article. Ecol_Monograph_supplement_code_dismo_1.txt – this code loads the training data, and trains an initial set of BRT models. Ecol_Monograph_supplement_code_dismo_2.txt – this code runs a procedure that suggests the number of variables that can be dropped from the initial set of BRT models. Ecol_Monograph_supplement_code_dismo_3.txt – this code creates a set of simplified BRT models with fewer variables, as determined by the previous step. Ecol_Monograph_supplement_code_dismo_4.txt – this code loads evaluation data, loads raster versions of predictor variables, projects models into geographic space, calculates variable importance, plots response curves, and evaluates models upon training data and evaluation data. Ecol_Monograph_supplement_code_dismo_5.txt – this code saves false positive rates and false negative rates for each model, when evaluated upon the training data and evaluation data. .csv files – these files contain the training data and evaluation data: lot_line_section_with_predictors.csv – this file contains the line-description data that was used to train SDMs. township_bt_post_with_predictors.csv – this file contains the township bearing-tree data, which was used to evaluate SDMs. township_line_section_with_predictors.csv – this file contains the township line-description data, which was used to evaluate SDMs. The township data above were used with the permission of Dr. Yi-Chen Wang. For more information regarding these datasets, see: Wang, Y.-C. 2007. Spatial patterns and vegetation-site relationships of the presettlement forests in western New York, USA. Journal of Biogeography 34:500–513. Tulowiecki, S. J., C. P. S. Larsen, and Y.-C. Wang. 2014. Effects of positional error on modeling species distributions: a perspective using presettlement land survey records. Plant Ecology 216:67–85. The following table contains descriptions of the columns, and checksum values, for the .csv files (sorted alphabetically by column name). With the exception of the “weights” columns, the three .csv files share the same column names (but obviously with different values). The evaluation data (“township_bt_post_with_ predictors.csv” and “township_line_section_with_predictors.csv”) do not contain case weight columns, because case weights were only used when training models using the training data (“lot_line_section_with_ predictors.csv”). There are no blank cell values in these .csv files. -- TABLE: Please see in attached file. -- biomod2_predictors.zip – this zipped file contains the predictor variables in raster format (coordinate system: UTM Zone 17N) that were used to project SDMs into geographic space, in order to train SDMs and create prediction surfaces.

文件列表： Ecol_Monograph_supplement_code_biomod2.txt（md5: 1468e75dbf74ed624a8dce871743f924） Ecol_Monograph_supplement_code_dismo_1.txt（md5: 555b20fbe747f7601c53d5b56a93459ea） Ecol_Monograph_supplement_code_dismo_2.txt（md5: a33a1745062f1bf816c3d9ec797cdd46） Ecol_Monograph_supplement_code_dismo_3.txt（md5: aff301c5ba52f04eff85e561122964c4） Ecol_Monograph_supplement_code_dismo_4.txt（md5: 244ff730dbd9da02a5439cfd95a439ca） Ecol_Monograph_supplement_code_dismo_5.txt（md5: bec6a05bf1d737b941d0a7a00bde3658） lot_line_section_with_predictors.csv（md5: 48dc1b92e2d3d3b3e4875ef0dc3b87a7） township_bt_post_with_predictors.csv（md5: 86f08554a0a65fec8065f85335aa8ec5） township_line_section_with_predictors.csv（md5: d028af68dcd8f7bca5b28e969cc5c796） biomod2_predictors.zip（md5: 7ab5a1d2ef1847fe64a47483e8220d70） ### 数据集说明本补充材料包含用于训练与评估物种分布模型（Species Distribution Models, SDMs）的数据与代码。其中包含6个可在R语言中运行的.txt格式代码文件，以及3个包含训练数据与评估数据的.csv格式文件。所有代码文件均附带注释（"#……"）以说明其运行逻辑。本补充材料中的代码文件存在两点说明：其一，若用户需复现研究结果，需对代码进行小幅修改，确保代码中引用的所有文件路径与用户本地存储数据文件的路径相匹配；其二，本代码用于训练包含北美原住民变量（Native American Variables, NAVs）的SDMs。若需运行不包含NAVs的SDMs，同样需对代码进行小幅修改，相关修改细节已在代码文件的注释中说明。上述两处修改均较为简单，耗时极短。此外需注意，训练与评估模型所需的处理时间较长。尽管"biomod2"代码具备高度自动化特性，但在个人计算机上运行仍需数小时至数天不等。"dismo"系列代码的运行时间则需数天至一周，且需手动向R中逐块输入代码。对于进阶R用户，可将代码修改为脚本形式以实现更高自动化程度。以下为各文件的详细说明： 1. **Ecol_Monograph_supplement_code_biomod2.txt**：该文件包含基于荷兰土地公司（Holland Land Company, HLC）线路描述（或称"线路区段"）数据训练SDMs的代码，使用了R语言"biomod2"包中的三种SDM算法：广义加性模型（Generalized Additive Models, GAMs）、广义线性模型（Generalized Linear Models, GLMs）及多元自适应回归样条（Multivariate Adaptive Regression Splines, MARS）。另有5个.txt文件用于借助R语言"dismo"包训练与评估提升回归树（Boosted Regression Tree, BRT）模型。BRT模型开发的代码被拆分为5个文件，需按顺序运行。需注意，由于BRT模型具备随机性，其运行结果与论文中报告的结果可能存在细微差异。 2. **Ecol_Monograph_supplement_code_dismo_1.txt**：该代码用于加载训练数据，并训练初始的BRT模型集合。 3. **Ecol_Monograph_supplement_code_dismo_2.txt**：该代码执行一项流程，用于建议可从初始BRT模型集合中移除的变量数量。 4. **Ecol_Monograph_supplement_code_dismo_3.txt**：该代码基于前一步骤的结果，创建变量更少的简化BRT模型集合。 5. **Ecol_Monograph_supplement_code_dismo_4.txt**：该代码用于加载评估数据、加载预测变量的栅格版本、将模型投影至地理空间、计算变量重要性、绘制响应曲线，并基于训练数据与评估数据对模型进行评估。 6. **Ecol_Monograph_supplement_code_dismo_5.txt**：当基于训练数据与评估数据对模型进行评估时，该代码用于保存各模型的假阳性率与假阴性率。 7. **.csv格式文件**：此类文件包含训练数据与评估数据： - `lot_line_section_with_predictors.csv`：包含用于训练SDMs的线路描述数据。 - `township_bt_post_with_predictors.csv`：包含用于评估SDMs的镇区方位树数据。 - `township_line_section_with_predictors.csv`：包含用于评估SDMs的镇区线路描述数据。上述镇区数据的使用已获得王一辰博士（Dr. Yi-Chen Wang）许可。如需了解更多数据集相关信息，请参阅以下文献： > Wang, Y.-C. 2007. Spatial patterns and vegetation-site relationships of the presettlement forests in western New York, USA. *Journal of Biogeography* 34:500–513. > Tulowiecki, S. J., C. P. S. Larsen, and Y.-C. Wang. 2014. Effects of positional error on modeling species distributions: a perspective using presettlement land survey records. *Plant Ecology* 216:67–85. 以下为各.csv文件的列说明与校验和值（按列名字母顺序排序）。除"权重"列外，3个.csv文件拥有相同的列名（但取值显然不同）。评估数据文件（`township_bt_post_with_predictors.csv`与`township_line_section_with_predictors.csv`）不包含案例权重列，因为案例权重仅在使用训练数据（`lot_line_section_with_predictors.csv`）训练模型时使用。所有.csv文件均无空白单元格值。 > ——表格：请参阅附件文件。 8. **biomod2_predictors.zip**：该压缩文件包含栅格格式的预测变量（坐标系：UTM 17N带），用于将SDMs投影至地理空间，以训练SDMs并生成预测表面。

创建时间：

2023-06-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集