five

Breast Cancer Wisconsin (Diagnostic) Data Set

收藏
www.kaggle.com2016-09-25 更新2025-01-21 收录
下载链接:
https://www.kaggle.com/uciml/breast-cancer-wisconsin-data
下载链接
链接失效反馈
官方服务:
资源简介:
Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34]. This database is also available through the UW CS ftp server: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/ Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29 Attribute Information: 1) ID number 2) Diagnosis (M = malignant, B = benign) 3-32) Ten real-valued features are computed for each cell nucleus: a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1) The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius. All feature values are recoded with four significant digits. Missing attribute values: none Class distribution: 357 benign, 212 malignant

该数据集的特征由乳腺肿块细针穿刺(FNA)的数字化图像计算得出。这些特征描述了图像中存在的细胞核的特征。在三维空间中的描述,可参考:[K. P. Bennett 和 O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34]。此数据库亦可通过威斯康星大学计算机科学系FTP服务器获取: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/ 亦可在UCI机器学习仓库找到:https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29 属性信息如下: 1) 身份编号 2) 诊断(M = 恶性,B = 良性) 3-32) 对每个细胞核计算十个实值特征: a) 半径(从中心到边缘点的距离平均值) b) 纹理(灰度值的标准差) c) 周长 d) 面积 e) 光滑度(半径长度的局部变化) f) 紧凑度(周长平方除以面积减1.0) g) 凹陷度(轮廓凹陷部分的程度) h) 凹陷点(轮廓凹陷部分的数目) i) 对称性 j) 分形维度(海岸线近似法 - 1) 这些特征的均值、标准误差以及“最差”或最大值(三个最大值的均值)对每个图像进行了计算,从而得到30个特征。例如,第3字段是均值半径,第13字段是半径标准误差,第23字段是最差半径。 所有特征值均以四位有效数字重新编码。 缺失属性值:无 类别分布:良性357例,恶性212例。
提供机构:
Kaggle
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作