A goodness-of-fit measure for logistic regression under separation
收藏Figshare2025-05-15 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/A_goodness-of-fit_measure_for_logistic_regression_under_separation/29083166
下载链接
链接失效反馈官方服务:
资源简介:
Logistic regression models have a severe problem called separation. The maximum likelihood estimator does not exist in logistic regression models for data structures under separation. Under separation, the forcibly estimated maximum likelihood estimate may have an extremely large value. Separation often occurs when the size of dataset is small. Consequently, goodness-of-fit measures based on the likelihood ratio and those based on covariance functions using the maximum likelihood estimate indicate that the model is excessively good regardless of the cause of the separation. The Firth and exact logistic regression methods are valid estimation methods for separation problems. Therefore, we propose methods to reasonably evaluate the goodness-of-fit measures of statistical models under separation with dataset of a small sample size with the abovementioned methods. The goodness-of-fit measures based on covariance functions which are a generalization of the multiple correlation coefficient, referred to as the regression correlation coefficient and the entropy coefficient of determination are then used combined with the abovementioned methods for the separation data. In addition, we conducted a data analysis using the definition of the non separation ratio based on the regression depth.
逻辑回归模型(Logistic regression models)存在一类被称为分离(separation)的严重问题。当数据结构存在分离情况时,逻辑回归模型的极大似然估计量(maximum likelihood estimator)并不存在。此时,强行估计得到的极大似然估计值可能会出现极大的数值。分离问题通常在数据集规模较小时出现。因此,基于似然比(likelihood ratio)的拟合优度指标,以及基于使用极大似然估计值的协方差函数(covariance functions)的拟合优度指标,会显示模型拟合效果过好,而无论分离问题的成因为何。弗思逻辑回归与精确逻辑回归方法是解决分离问题的有效估计方法。据此,本文提出结合上述方法,对小样本量下存在分离问题的统计模型的拟合优度指标进行合理评估的方案。随后,本文将基于作为多重相关系数(multiple correlation coefficient)推广形式的协方差函数的拟合优度指标——即回归相关系数与熵决定系数(entropy coefficient of determination),与前述方法结合,用于处理存在分离情况的数据。此外,本文还基于回归深度(regression depth)定义的非分离比率(non separation ratio)开展了数据分析工作。
创建时间:
2025-05-15



