five

Balanced Prostate Cancer Clinical Dataset with Hematological and Diagnostic Indicators for Risk Classification

收藏
DataCite Commons2026-03-18 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/hrx8yms94t/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset presents a clinically structured and balanced collection of prostate cancer patient records designed for data-driven analysis and machine learning applications in oncology research. The dataset is derived from a publicly available prostate cancer dataset and has been expanded and preprocessed to enhance its usability for classification and predictive modeling tasks. The final dataset consists of 1,752 patient records, each described by a set of clinically relevant features associated with prostate cancer progression and diagnosis. The dataset includes 8 independent variables and 1 binary target variable, representing patient risk classification. To improve the reliability and robustness of analytical models, the dataset has been balanced with a class distribution of 55% (class 0) and 45% (class 1). Class 0 represents lower-risk or non-critical cases, while class 1 represents higher-risk or clinically significant cases. The dataset includes important clinical indicators such as tumor volume, prostate-specific antigen (PSA) levels, Gleason score, and other diagnostic measurements widely used in oncology practice. Several features are log-transformed to maintain statistical consistency and improve modeling performance. The dataset contains several clinically relevant features used in prostate cancer assessment. The variable lcavol represents log-transformed cancer volume, indicating tumor burden, while lweight reflects prostate size. The age feature denotes the patient’s age. The variable lbph indicates benign prostate enlargement, and svi (seminal vesicle invasion) shows whether cancer has spread beyond the prostate. The feature lcp represents capsular penetration, indicating tumor extension. The gleason score measures cancer aggressiveness, and pgg45 represents the proportion of higher-grade tumor cells. The lpsa variable reflects the prostate-specific antigen level, an important cancer biomarker. The target variable Target is binary, where 0 indicates low risk and 1 indicates high risk. Data preprocessing steps include: -Handling of class imbalance through resampling techniques -Data augmentation using controlled Gaussian noise to simulate real-world variability -Normalization-friendly feature structure -Random shuffling to eliminate ordering bias This dataset is suitable for: -Machine learning classification tasks -Clinical risk prediction modeling -Explainable AI (XAI) research -Oncology decision support systems Additionally, the dataset can support interdisciplinary research in: -clinical data science -cancer informatics -healthcare analytics All data are anonymized and contain no personally identifiable information.
提供机构:
Mendeley Data
创建时间:
2026-03-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作