乳腺癌美国威斯康星州(诊断)数据集
收藏帕依提提2024-03-04 收录
下载链接:
https://www.payititi.com/opendatasets/show-26337.html
下载链接
链接失效反馈官方服务:
资源简介:
特征是从乳腺肿块的细针抽吸(FNA)的数字化图像计算得出的。它们描述了图像中存在的细胞核的特征。 在3维空间中描述的空间如下:[KP Bennett和OL Mangasarian:“两个线性不可分集合的鲁棒线性编程判别”,优化方法和软件1,1992,23-34]。 属性信息: 1)ID号 2)诊断(M =恶性,B =良性) 3-32) 为每个细胞核计算十个实值特征: a)半径(从中心到周边点的距离的平均值) b)纹理(灰度值的标准偏差) c)周边 d)面积 e)平滑度(半径长度的局部变化) f)紧密度(周长^ 2 /面积-1.0) g)凹度(轮廓凹部的严重程度) h)凹点(轮廓凹部的数量) i)对称性 j)分形维数(“海岸线近似”-1) 为每个图像计算这些特征的平均值,标准误以及“最差”或最大(三个最大值的平均值), 从而得到30个特征。例如,字段3是平均半径,字段13是半径SE,字段23是最差半径。 所有功能值都用四个有效数字重新编码。 缺少属性值:无 等级分配:357良性,212恶性
Features are computed from digitized images of fine needle aspirate (FNA) of breast mass samples, which describe the characteristics of cell nuclei present in the images. The spatial characteristics are defined in a 3-dimensional space, with the following reference: KP Bennett and OL Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software, vol. 1, 1992, pp. 23-34.
Attribute Information:
1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32) Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter² / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)
For each image, the mean, standard error, and "worst" or largest (mean of the three largest values) of these features are computed, resulting in 30 total features. For example, field 3 is mean radius, field 13 is radius standard error (SE), and field 23 is worst radius.
All feature values are recoded with four significant digits.
Missing attribute values: none
Class distribution: 357 benign, 212 malignant
提供机构:
帕依提提
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个医学分类数据集,用于基于乳腺肿块细针抽吸数字化图像的特征预测癌症为良性或恶性。它包含30个特征,这些特征从细胞核的十个实值属性(如半径、纹理、面积等)计算得出,每个属性包括平均值、标准误和最大值统计量。数据集共有569个样本,其中357个为良性,212个为恶性,无缺失值,适用于机器学习分类任务。
以上内容由遇见数据集搜集并总结生成



