Supplement 1. R code and the data set necessary to conduct the Random Forest analysis.
收藏DataCite Commons2020-09-04 更新2024-07-25 收录
下载链接:
https://wiley.figshare.com/articles/dataset/Supplement_1_R_code_and_the_data_set_necessary_to_conduct_the_Random_Forest_analysis_/3520547/1
下载链接
链接失效反馈官方服务:
资源简介:
File List dreissena_in_lakes_of_belarus.csv (MD5: 3dc2d2f89af3064223358983c785771d) r_script_random_forest.R (MD5: af1295890d60bc832955e940889e4575) Description This Supplementary material contains two files necessary to fully reproduce the results obtained using the Random Forest classifier. The first of these files, dreissena_in_lakes_of_belarus.csv, is a plain text table that has 553 records, each described with the following variables: 1. Lake_Code: numeric codes uniquely identifying each lake (for reference only, not used in analysis explicitely). 2. ZMpresence: indicator of whether a lake is infested with zebra mussel (0 – for non-infested, 1 – for infested). 3. LAREA: lake area 4. LVOL: lake volume 5. MAXD: maximal depth 6. AVED: average depth 7. SPECWATSHED: specific watershed (i.e., drainage area) 8. TRANSP: Secci depth 9. COLOR: water color 10. pH: water pH 11. HCO3: HCO3 content 12. SO4: SO4 content 13. Cl: CL content 14. Ca: Ca content 15. Mg: Mg content 16. TDS: total dissolved solids 17: Fe: Fe content 18. Si: Si content 19. NH4: NH4 content 20. NO2: NO2 content 21. PO4: PO4 content 22. PermOx: permanganate oxydizability 23. N: latitude (decimal degree) 24: E: longitude (decimal degree) Missing values in the data set are denoted as NA. The second file, r_script_random_forest.R, loads the data into R (assuming that the file dreissena_in_lakes_of_belarus.csv is stored in the current R working directory), fits the Random Forest model, and plots the results. The analysis relies on three add-on packages: caret, geosphere, randomForest, and ggplot2. All these packages are assumed to be already installed on the user's computer (if not, they can be freely downloaded from the Comprehensive R Archive Network, cran.r-project.org, or installed directly from within R using the following command: install.packages(c("caret", "geosphere", "randomForest", "ggplot2"))).
文件清单:dreissena_in_lakes_of_belarus.csv(MD5值:3dc2d2f89af3064223358983c785771d)、r_script_random_forest.R(MD5值:af1295890d60bc832955e940889e4575)。
本补充材料包含两份文件,可完整复现使用随机森林(Random Forest)分类器得到的实验结果。
第一份文件为dreissena_in_lakes_of_belarus.csv,为纯文本表格,共包含553条记录,每条记录对应以下变量:
1. Lake_Code:唯一标识各湖泊的数值编码,仅作参考,未明确纳入分析。
2. ZMpresence:湖泊是否受斑马贻贝侵染的指示变量(0代表未侵染,1代表已侵染)。
3. LAREA:湖泊面积。
4. LVOL:湖泊容积。
5. MAXD:最大水深。
6. AVED:平均水深。
7. SPECWATSHED:特定流域(即汇水面积)。
8. TRANSP:赛氏透明度深度。
9. COLOR:水体颜色。
10. pH:水体pH值。
11. HCO3:碳酸氢根(HCO₃⁻)含量。
12. SO4:硫酸根(SO₄²⁻)含量。
13. Cl:氯离子(Cl⁻)含量。
14. Ca:钙离子(Ca²⁺)含量。
15. Mg:镁离子(Mg²⁺)含量。
16. TDS:总溶解固体(Total Dissolved Solids)。
17. Fe:铁元素含量。
18. Si:硅元素含量。
19. NH4:铵态氮(NH₄⁺)含量。
20. NO2:亚硝态氮(NO₂⁻)含量。
21. PO4:正磷酸盐(PO₄³⁻)含量。
22. PermOx:高锰酸盐氧化性(高锰酸盐指数)。
23. N:纬度(十进制度)。
24. E:经度(十进制度)。
该数据集中的缺失值以NA标识。
第二份文件r_script_random_forest.R,可将数据导入R环境(前提为dreissena_in_lakes_of_belarus.csv文件已存储于当前R工作目录),拟合随机森林模型并绘制实验结果。本分析依赖四个扩展包:caret、geosphere、randomForest及ggplot2。假设用户已在本地完成上述包的安装;若未安装,可从R综合归档网络(Comprehensive R Archive Network, CRAN, cran.r-project.org)免费下载,或直接在R环境中执行以下命令完成安装:install.packages(c("caret", "geosphere", "randomForest", "ggplot2")).
提供机构:
Wiley
创建时间:
2016-08-04



