CBC indices data set
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/p85w7tbjbh
下载链接
链接失效反馈官方服务:
资源简介:
Dataset composition
5,778 — Non-SMOTE-NC (raw)
Original tabular hematology dataset (routine analyzer parameters) with four-class labels (Class 0–3; target emphasis on Class 2 = Hb E trait ± α-thalassemia). No resampling, no scaling. Intended for the non-SMOTE pipeline and baseline descriptive statistics.
5,778 — Non-SMOTE-NC — Z-score
Same records as (1), with all model features standardized to Z-scores. No class rebalancing. Used for ReliefF ranking, model fitting, and internal testing in the non-SMOTE scenario.
20,000 — SMOTE-NC (raw)
Class-rebalanced cohort generated from the 5,778-record source using SMOTE-NC to upsample minority classes while preserving categorical structure. Values remain on the original (unscaled) measurement units. Used for training/validation in the SMOTE-NC scenario.
20,000 — SMOTE-NC — Z-score
Z-score–standardized version of (3) for model development in the SMOTE-NC pipeline (feature selection, tuning, and internal testing).
625 — External data — Z-score
Independent cohort prepared for model inference with standardized (Z-score) features. Used exclusively for external validation of the final models.
625 — External data (raw)
Raw (unscaled) version of the same independent cohort in (5). Retained for auditing, sensitivity checks, and any site-specific recalibration.
Notes:
All tabular sets contain identical label definitions (Class 0 = normal/non-clinically significant, Class 1 = normal Hb typing ± possible α-thal, Class 2 = Hb E trait ± α-thal, Class 3 = other thalassemic patterns).
Z-score versions provide standardized features for ReliefF selection and model input; raw versions support QC and re-scaling if needed.
SMOTE-NC sets are for training/validation only; performance is reported on held-out internal tests and on the external cohort (n = 625).
创建时间:
2025-12-18



