five

Leveraging Unlabeled Data for Superior ROC Curve Estimation via a Semiparametric Approach

收藏
Figshare2025-01-07 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Leveraging_Unlabeled_Data_for_Superior_ROC_Curve_Estimation_via_a_Semiparametric_Approach/28156199
下载链接
链接失效反馈
官方服务:
资源简介:
The receiver operating characteristic (ROC) curve is a widely used tool in various fields, including economics, medicine, and machine learning, for evaluating classification performance and comparing treatment effect. The absence of clear and readily labels is a frequent phenomenon in estimating ROC owing to various reasons like labeling cost, time constraints, data privacy and information asymmetry. Traditional supervised estimators commonly rely solely on labeled data, where each sample is associated with a fully observed response variable. We propose a new set of semi-supervised (SS) estimators to exploit available unlabeled data (samples lack of observations for responses) to enhance the estimation precision under the semi-parametric setting assuming that the distribution of the response variable for one group is known up to unknown parameters. The newly proposed SS estimators have attractive properties such as adaptability and efficiency by leveraging the flexibility of kernel smoothing method. We establish the large sample properties of the SS estimators, which demonstrate that the SS estimators outperform the supervised estimator consistently under mild assumptions. Numeric experiments provide empirical evidence to support our theoretical findings. Finally, we showcase the practical applicability of our proposed methodology by applying it to two real datasets.

受试者工作特征(Receiver Operating Characteristic, ROC)曲线是经济学、医学、机器学习等诸多领域中用于评估分类性能、比较治疗效应的常用工具。在ROC曲线的估计过程中,受标注成本、时间约束、数据隐私及信息不对称等因素影响,缺乏清晰易获取的标注样本是一种常见情况。传统监督估计量通常仅依赖标注数据,此时每个样本均对应一个完全观测的响应变量。我们提出了一类全新的半监督(Semi-Supervised, SS)估计量,以利用现有未标注数据(即缺失响应变量观测值的样本),在半参数框架下提升估计精度——该框架假设某一组的响应变量分布已知,仅其参数待估。所提出的SS估计量借助核平滑方法的灵活性,具备适应性与有效性等优良特性。我们推导了该SS估计量的大样本性质,证明在温和假设条件下,SS估计量始终优于传统监督估计量。数值实验为我们的理论结论提供了实证支撑。最后,我们将所提方法应用于两个真实数据集,以此展示其实际应用价值。
创建时间:
2025-01-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作