five

Code from: Testing for normality in regression models: mistakes abound (but may not matter)

收藏
DataONE2025-05-17 更新2025-05-31 收录
下载链接:
https://search.dataone.org/view/sha256:849d8b9aad17b7d512dd1c27435e96540e501072b4de7ddb665e70d3ee44b880
下载链接
链接失效反馈
官方服务:
资源简介:
This study examines the misuse of normality tests in linear regression within ecology and biology, focusing on common misconceptions. A bibliometric review found that over 70% of ecology papers and 90% of biology papers incorrectly applied normality tests to raw data instead of model residuals. To assess the impact of this error, we simulated datasets with normal, interval, and skewed distributions across various sample and effect sizes. We compared statistical power between two approaches: testing the whole dataset for normality (incorrect) versus testing model residuals (correct) to determine whether to use a parametric (t-test) or nonparametric (Mann-Whitney U test) method. Our results showed minimal differences in statistical power between the approaches, even when normality was incorrectly tested on raw data. However, when residuals violated the normality assumption, using the Mann-Whitney U test increased statistical power by 3–4%. Overall, the study suggests that, while correctly..., , , # Code for Normality Test Study [https://doi.org/10.5061/dryad.sqv9s4nd0](https://doi.org/10.5061/dryad.sqv9s4nd0) ## Description of the data and file structure The data files include those required to reproduce the analysis in \"It’s OK Not to be Normal: Usage of Normality Tests in Linear Models\" by S.R. Midway and J.W. White. ### Files and variables #### File: Normality\_Code.zip **Description:** Unzips to 5 files. \"interval_sims.R\", \"lognormal_sims.R\", and \"normal_sims.R\" are all R scripts that generate the data used in the study, each based on their respective distribution. \"normality_comp.R\" is an R script to reproduce the comparison of different tests of normality. \"workflows_power.R\" is an R script that reproduces the 3 analytical decisions in the manuscript. ## Code/software All code is included in the attached files. All code are R scripts that can be run through the free software R, with the associated libraries that are specified in the scripts.  ## Access information ...,

本研究聚焦生态学与生物学领域内线性回归中正态性检验的误用问题,针对领域内常见的认知误区展开探讨。经文献计量分析发现,超过70%的生态学论文与90%的生物学论文存在正态性检验误用问题:研究者直接对原始数据开展正态性检验,而非针对模型残差进行检验。为评估该错误带来的影响,我们针对不同样本量与效应量水平,构建了服从正态分布、区间分布以及偏态分布的模拟数据集。我们对比了两种分析路径的统计效力:其一为错误地对全数据集进行正态性检验,其二为正确地对模型残差开展检验,以此决策应当采用参数检验(t检验)还是非参数检验(曼-惠特尼U检验(Mann-Whitney U test))。研究结果显示,两种路径的统计效力差异极小,即便在对原始数据错误开展正态性检验的场景下亦是如此。然而当残差违背正态性假设时,采用曼-惠特尼U检验可使统计效力提升3%~4%。总体而言,本研究表明,……(原文未完成) # 正态性检验研究代码 [https://doi.org/10.5061/dryad.sqv9s4nd0](https://doi.org/10.5061/dryad.sqv9s4nd0) ## 数据与文件结构说明 本数据集包含复现S.R. 米德韦与J.W. 怀特发表的《无需强求正态:线性模型中正态性检验的应用》一文分析过程所需的全部文件。 ### 文件与变量 #### 文件:Normality_Code.zip **文件说明:** 解压后将得到5个文件。其中`interval_sims.R`、`lognormal_sims.R`与`normal_sims.R`均为R脚本,分别用于生成对应分布(区间分布、对数正态分布与正态分布)的研究数据;`normality_comp.R`为复现不同正态性检验方法对比的R脚本;`workflows_power.R`则用于复现论文中提及的3项分析决策流程。 ## 代码与软件 所有代码均包含于附件文件中,均为可通过免费软件R运行的R脚本,运行所需的关联依赖库已在各脚本中注明。 ## 获取方式 ……
创建时间:
2025-05-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作