Breast Cancer Wisconsin (Diagnostic) Data Set
收藏www.kaggle.com2016-09-25 更新2025-01-21 收录
下载链接:
https://www.kaggle.com/uciml/breast-cancer-wisconsin-data
下载链接
链接失效反馈官方服务:
资源简介:
Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].
This database is also available through the UW CS ftp server:
ftp ftp.cs.wisc.edu
cd math-prog/cpo-dataset/machine-learn/WDBC/
Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
Attribute Information:
1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32)
Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)
The mean, standard error and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.
All feature values are recoded with four significant digits.
Missing attribute values: none
Class distribution: 357 benign, 212 malignant
该数据集的特征由乳腺肿块细针穿刺(FNA)的数字化图像计算得出。这些特征描述了图像中存在的细胞核的特征。在三维空间中的描述,可参考:[K. P. Bennett 和 O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34]。此数据库亦可通过威斯康星大学计算机科学系FTP服务器获取:
ftp ftp.cs.wisc.edu
cd math-prog/cpo-dataset/machine-learn/WDBC/
亦可在UCI机器学习仓库找到:https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
属性信息如下:
1) 身份编号
2) 诊断(M = 恶性,B = 良性)
3-32) 对每个细胞核计算十个实值特征:
a) 半径(从中心到边缘点的距离平均值)
b) 纹理(灰度值的标准差)
c) 周长
d) 面积
e) 光滑度(半径长度的局部变化)
f) 紧凑度(周长平方除以面积减1.0)
g) 凹陷度(轮廓凹陷部分的程度)
h) 凹陷点(轮廓凹陷部分的数目)
i) 对称性
j) 分形维度(海岸线近似法 - 1)
这些特征的均值、标准误差以及“最差”或最大值(三个最大值的均值)对每个图像进行了计算,从而得到30个特征。例如,第3字段是均值半径,第13字段是半径标准误差,第23字段是最差半径。
所有特征值均以四位有效数字重新编码。
缺失属性值:无
类别分布:良性357例,恶性212例。
提供机构:
Kaggle



