威斯康星州乳腺癌(预后)数据集
收藏帕依提提2024-03-04 收录
下载链接:
https://www.payititi.com/opendatasets/show-25975.html
下载链接
链接失效反馈官方服务:
资源简介:
Creators: 1. Dr. William H. Wolberg, General Surgery Dept. University of Wisconsin, Clinical Sciences Center Madison, WI 53792 wolberg '@' eagle.surgery.wisc.edu 2. W. Nick Street, Computer Sciences Dept. University of Wisconsin 1210 West Dayton St., Madison, WI 53706 street '@' cs.wisc.edu 608-262-6619 3. Olvi L. Mangasarian, Computer Sciences Dept., University of Wisconsin 1210 West Dayton St., Madison, WI 53706 olvi '@' cs.wisc.edu Donor: Nick Street Data Set Information: Each record represents follow-up data for one breast cancer case. These are consecutive patients seen by Dr. Wolberg since 1984, and include only those cases exhibiting invasive breast cancer and no evidence of distant metastases at the time of diagnosis. The first 30 features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. A few of the images can be found at [Web Link] The separation described above was obtained using Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree Construction Via Linear Programming." Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society, pp. 97-101, 1992], a classification method which uses linear programming to construct a decision tree. Relevant features were selected using an exhaustive search in the space of 1-4 features and 1-3 separating planes. The actual linear program used to obtain the separating plane in the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34]. The Recurrence Surface Approximation (RSA) method is a linear programming model which predicts Time To Recur using both recurrent and nonrecurrent cases. See references (i) and (ii) above for details of the RSA method. This database is also available through the UW CS ftp server: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WPBC/ Attribute Information: 1) ID number 2) Outcome (R = recur, N = nonrecur) 3) Time (recurrence time if field 2 = R, disease-free time if field 2 = N) 4-33) Ten real-valued features are computed for each cell nucleus: a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1) Relevant Papers: W. N. Street, O. L. Mangasarian, and W.H. Wolberg. An inductive learning approach to prognostic prediction. In A. Prieditis and S. Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 522--530, San Francisco, 1995. Morgan Kaufmann. [Web Link] O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), pages 570-577, July-August 1995. [Web Link] W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. Computerized breast cancer diagnosis and prognosis from fine needle aspirates. Archives of Surgery 1995;130:511-516. [Web Link] W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Analytical and Quantitative Cytology and Histology, Vol. 17 No. 2, pages 77-87, April 1995. W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. Computer-derived nuclear ``grade'' and breast cancer prognosis. Analytical and Quantitative Cytology and Histology, Vol. 17, pages 257-264, 1995. [Web Link] See also: [Web Link] [Web Link]
创作者:1. 威廉·H·沃尔伯格博士(Dr. William H. Wolberg),威斯康星大学普通外科,临床科学中心,麦迪逊,威斯康星州53792,邮箱:wolberg@eagle.surgery.wisc.edu;2. W·尼克·斯特里特(W. Nick Street),威斯康星大学计算机科学系,地址:1210 West Dayton St., Madison, WI 53706,邮箱:street@cs.wisc.edu,电话:608-262-6619;3. 奥维·L·曼加萨里安(Olvi L. Mangasarian),威斯康星大学计算机科学系,地址:1210 West Dayton St., Madison, WI 53706,邮箱:olvi@cs.wisc.edu。
捐赠者:尼克·斯特里特(Nick Street)
数据集信息:每条记录对应一例乳腺癌患者的随访数据,均为1984年以来沃尔伯格医生接诊的连续病例,仅纳入确诊时表现为浸润性乳腺癌且无远处转移证据的病例。前30项特征由乳腺肿块细针抽吸活检(FNA, Fine Needle Aspiration)的数字化图像计算得到,用于描述图像中细胞核的各项特征。部分相关图像可参见[网页链接]。
上述分类边界通过多表面树法(MSM-T, Multisurface Method-Tree)[K. P. Bennett,《基于线性规划的决策树构建》,第四届中西部人工智能与认知科学学会会议论文集,第97-101页,1992年]得到,该分类方法通过线性规划构建决策树。特征选择阶段在1至4个特征、1至3个分离平面的空间内进行穷尽搜索以筛选相关特征。三维空间中分离平面的实际线性规划模型参见文献[K. P. Bennett与O. L. Mangasarian:《两类线性不可分样本的鲁棒线性规划判别》,《优化方法与软件》第1卷,1992年,第23-34页]。
复发表面近似(RSA, Recurrence Surface Approximation)方法是一种线性规划模型,可结合复发与未复发病例预测复发时间。相关细节可参见上述参考文献(i)与(ii)。本数据集亦可通过威斯康星大学计算机科学系FTP服务器获取:ftp ftp.cs.wisc.edu,进入目录`cd math-prog/cpo-dataset/machine-learn/WPBC/`。
属性信息:
1) 编号(ID)
2) 结局(R代表复发,N代表未复发)
3) 时间(若结局为R则为复发时间,若为N则为无病生存时间)
4-33) 针对每个细胞核计算的10项实值特征,每项特征分别包含均值、标准误差与最大值三个统计维度:
a) 半径:细胞核中心到边缘各点距离的均值
b) 纹理:灰度值的标准差
c) 周长
d) 面积
e) 平滑度:半径长度的局部变异程度
f) 紧致度:(周长²/面积) - 1.0
g) 凹陷度:轮廓凹陷部分的严重程度
h) 凹陷点数:轮廓的凹陷部分数量
i) 对称性
j) 分形维数:"海岸线近似值减1"
相关论文:
1. W. N. Street, O. L. Mangasarian, 与W.H. Wolberg. 《用于预后预测的归纳学习方法》,载于A. Prieditis与S. Russell编辑,《第十二届国际机器学习会议论文集》,第522--530页,旧金山,1995年,Morgan Kaufmann出版社。[网页链接]
2. O.L. Mangasarian, W.N. Street 与W.H. Wolberg. 《基于线性规划的乳腺癌诊断与预后预测》,《运筹学》第43卷第4期,第570-577页,1995年7-8月。[网页链接]
3. W.H. Wolberg, W.N. Street, D.M. Heisey, 与O.L. Mangasarian. 《基于细针抽吸活检的计算机辅助乳腺癌诊断与预后预测》,《外科学文献》1995;130:511-516。[网页链接]
4. W.H. Wolberg, W.N. Street, 与O.L. Mangasarian. 《图像分析与机器学习在乳腺癌诊断与预后中的应用》,《分析与定量细胞学与组织学》第17卷第2期,第77-87页,1995年4月。
5. W.H. Wolberg, W.N. Street, D.M. Heisey, 与O.L. Mangasarian. 《计算机推导的细胞核"分级"与乳腺癌预后》,《分析与定量细胞学与组织学》第17卷,第257-264页,1995年。[网页链接]
另可参见:[网页链接] [网页链接]
提供机构:
帕依提提
搜集汇总
数据集介绍

背景与挑战
背景概述
威斯康星州乳腺癌(预后)数据集包含乳腺癌病例的随访数据,用于分类和回归任务。数据集中的30个特征来自细针抽吸(FNA)的数字图像,描述了细胞核的特性,适用于乳腺癌的预后预测研究。
以上内容由遇见数据集搜集并总结生成



