Formation of a General Training Dataset for Breast Cancer Based on Nominal Features

Name: Formation of a General Training Dataset for Breast Cancer Based on Nominal Features
Creator: Mendeley Data
Published: 2026-04-15 13:04:35
License: 暂无描述

DataCite Commons2026-04-15 更新2026-05-04 收录

下载链接：

https://data.mendeley.com/datasets/fsb6wdyzpy

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset was developed based on clinical data collected from the Surkhandarya branch of the Republican Specialized Scientific and Practical Medical Center of Oncology and Radiology of the Republic of Uzbekistan. During the study, breast cancer patient records were thoroughly analyzed in collaboration with oncologists and domain experts, and key clinical and anamnestic features (symptoms) were identified.Initially, a total of 1,054 patient records were collected for the study. During the preprocessing stage, the records were evaluated for completeness, reliability, and consistency with the selected features. Records that lacked sufficient information in sections such as complaints, medical history, life history, epidemiological history, local status, and major physiological systems (respiratory, cardiovascular, digestive, and urinary systems) were considered unsuitable and excluded from the dataset. As a result, a final training dataset consisting of 567 complete and reliable instances was formed. The dataset is represented in a nominal feature space, where each instance is described using 32 clinical and diagnostic features. These features were identified in collaboration with medical experts and represent key indicators for the early detection of breast cancer. At the next stage, an informative feature selection algorithm was applied to identify the most significant features, resulting in a reduced set of 18 features. Based on these selected features, the dataset was classified into 13 distinct classes. The distribution of instances across classes is as follows: Class 1 – 54 instances, Class 2 – 73, Class 3 – 19, Class 4 – 39, Class 5 – 16, Class 6 – 5, Class 7 – 276, Class 8 – 45, Class 9 – 8, Class 10 – 10, Class 11 – 10, Class 12 – 7, and Class 13 – 5 instances. This dataset is intended for use in early diagnosis, classification, and predictive modeling of breast cancer, and can support the development of machine learning algorithms as well as clinical decision-making systems.

提供机构：

Mendeley Data

创建时间：

2026-04-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集