Supplement 1. R code and the data set necessary to conduct the Random Forest analysis.

Name: Supplement 1. R code and the data set necessary to conduct the Random Forest analysis.
Creator: Wiley
Published: 2020-09-04 02:10:43
License: 暂无描述

DataCite Commons2020-09-04 更新2024-07-25 收录

下载链接：

https://wiley.figshare.com/articles/dataset/Supplement_1_R_code_and_the_data_set_necessary_to_conduct_the_Random_Forest_analysis_/3520547/1

下载链接

链接失效反馈

官方服务：

资源简介：

File List dreissena_in_lakes_of_belarus.csv (MD5: 3dc2d2f89af3064223358983c785771d) r_script_random_forest.R (MD5: af1295890d60bc832955e940889e4575) Description This Supplementary material contains two files necessary to fully reproduce the results obtained using the Random Forest classifier. The first of these files, dreissena_in_lakes_of_belarus.csv, is a plain text table that has 553 records, each described with the following variables: 1. Lake_Code: numeric codes uniquely identifying each lake (for reference only, not used in analysis explicitely). 2. ZMpresence: indicator of whether a lake is infested with zebra mussel (0 – for non-infested, 1 – for infested). 3. LAREA: lake area 4. LVOL: lake volume 5. MAXD: maximal depth 6. AVED: average depth 7. SPECWATSHED: specific watershed (i.e., drainage area) 8. TRANSP: Secci depth 9. COLOR: water color 10. pH: water pH 11. HCO3: HCO3 content 12. SO4: SO4 content 13. Cl: CL content 14. Ca: Ca content 15. Mg: Mg content 16. TDS: total dissolved solids 17: Fe: Fe content 18. Si: Si content 19. NH4: NH4 content 20. NO2: NO2 content 21. PO4: PO4 content 22. PermOx: permanganate oxydizability 23. N: latitude (decimal degree) 24: E: longitude (decimal degree) Missing values in the data set are denoted as NA. The second file, r_script_random_forest.R, loads the data into R (assuming that the file dreissena_in_lakes_of_belarus.csv is stored in the current R working directory), fits the Random Forest model, and plots the results. The analysis relies on three add-on packages: caret, geosphere, randomForest, and ggplot2. All these packages are assumed to be already installed on the user's computer (if not, they can be freely downloaded from the Comprehensive R Archive Network, cran.r-project.org, or installed directly from within R using the following command: install.packages(c("caret", "geosphere", "randomForest", "ggplot2"))).

文件清单：dreissena_in_lakes_of_belarus.csv（MD5值：3dc2d2f89af3064223358983c785771d）、r_script_random_forest.R（MD5值：af1295890d60bc832955e940889e4575）。本补充材料包含两份文件，可完整复现使用随机森林（Random Forest）分类器得到的实验结果。第一份文件为dreissena_in_lakes_of_belarus.csv，为纯文本表格，共包含553条记录，每条记录对应以下变量： 1. Lake_Code：唯一标识各湖泊的数值编码，仅作参考，未明确纳入分析。 2. ZMpresence：湖泊是否受斑马贻贝侵染的指示变量（0代表未侵染，1代表已侵染）。 3. LAREA：湖泊面积。 4. LVOL：湖泊容积。 5. MAXD：最大水深。 6. AVED：平均水深。 7. SPECWATSHED：特定流域（即汇水面积）。 8. TRANSP：赛氏透明度深度。 9. COLOR：水体颜色。 10. pH：水体pH值。 11. HCO3：碳酸氢根（HCO₃⁻）含量。 12. SO4：硫酸根（SO₄²⁻）含量。 13. Cl：氯离子（Cl⁻）含量。 14. Ca：钙离子（Ca²⁺）含量。 15. Mg：镁离子（Mg²⁺）含量。 16. TDS：总溶解固体（Total Dissolved Solids）。 17. Fe：铁元素含量。 18. Si：硅元素含量。 19. NH4：铵态氮（NH₄⁺）含量。 20. NO2：亚硝态氮（NO₂⁻）含量。 21. PO4：正磷酸盐（PO₄³⁻）含量。 22. PermOx：高锰酸盐氧化性（高锰酸盐指数）。 23. N：纬度（十进制度）。 24. E：经度（十进制度）。该数据集中的缺失值以NA标识。第二份文件r_script_random_forest.R，可将数据导入R环境（前提为dreissena_in_lakes_of_belarus.csv文件已存储于当前R工作目录），拟合随机森林模型并绘制实验结果。本分析依赖四个扩展包：caret、geosphere、randomForest及ggplot2。假设用户已在本地完成上述包的安装；若未安装，可从R综合归档网络（Comprehensive R Archive Network, CRAN, cran.r-project.org）免费下载，或直接在R环境中执行以下命令完成安装：install.packages(c("caret", "geosphere", "randomForest", "ggplot2")).

提供机构：

Wiley

创建时间：

2016-08-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集