five

Meta-Feature Driven Medical Algorithm Performance Dataset (Medical | Healthcare)

收藏
DataCite Commons2026-04-22 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/bkp4jdxr8w
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is a specialized meta-learning dataset designed for algorithm selection in medical informatics. It contains 420 records derived from 42 distinct clinical datasets obtained from the OpenML repository. Each record represents the evaluation of one machine learning algorithm applied to a specific medical dataset, resulting in a structured collection of performance outcomes and statistical descriptors. The dataset integrates meta-features that describe the intrinsic properties of each clinical dataset, alongside performance metrics of multiple algorithm families. These meta-features include statistical signatures such as entropy, class imbalance, skewness, kurtosis, correlation structure, and feature composition ratios. Together, they characterize the geometry and complexity of the underlying medical data. In addition to statistical descriptors, the dataset tracks algorithm-level performance using metrics such as F1-score, accuracy, and execution time. Each record identifies the evaluated algorithm, its performance, and whether it achieved the best result for the corresponding dataset. A comparative reference layer is also included, storing the performance of all algorithm families for each dataset, enabling direct benchmarking. The dataset further incorporates ETL audit information, including data cleaning impact, number of removed constant features, and final feature counts. Structural properties such as number of rows and columns are also included, providing insight into dataset scale and dimensionality. Organized in CSV format, the dataset is compatible with standard analytical tools such as Python, R, and AutoML systems. All data is fully anonymized and derived from publicly available benchmarks, ensuring no exposure of patient-identifiable information. This dataset provides a unified foundation for studying the relationship between dataset characteristics and algorithm performance, supporting the development of automated model selection systems in healthcare analytics.
提供机构:
Mendeley Data
创建时间:
2026-04-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作