five

肺癌数据集

收藏
帕依提提2024-03-04 收录
下载链接:
https://www.payititi.com/opendatasets/show-26138.html
下载链接
链接失效反馈
官方服务:
资源简介:
Data Set Information: This data was used by Hong and Young to illustrate the power of the optimal discriminant plane even in ill-posed settings. Applying the KNN method in the resulting plane gave 77% accuracy. However, these results are strongly biased (See Aeberhard's second ref. above, or email to stefan '@' coral.cs.jcu.edu.au). Results obtained by Aeberhard et al. are : RDA : 62.5%, KNN 53.1%, Opt. Disc. Plane 59.4% The data described 3 types of pathological lung cancers. The Authors give no information on the individual variables nor on where the data was originally used. Notes: - In the original data 4 values for the fifth attribute were -1. These values have been changed to ? (unknown). (*) - In the original data 1 value for the 39 attribute was 4. This value has been changed to ? (unknown). (*) Attribute Information: Attribute 1 is the class label. All predictive attributes are nominal, taking on integer values 0-3 Relevant Papers: Hong, Z.Q. and Yang, J.Y. "Optimal Discriminant Plane for a Small Number of Samples and Design Method of Classifier on the Plane", Pattern Recognition, Vol. 24, No. 4, pp. 317-324, 1991. [Web link] Aeberhard, S., Coomans, D, De Vel, O. "Comparisons of Classification Methods in High Dimensional Settings", submitted to Technometrics. Aeberhard, S., Coomans, D, De Vel, O. "The Dangers of Bias in High Dimensional Settings", submitted to pattern Recognition. Papers That Cite This Data Set1: Jinyan Li and Limsoon Wong. Using Rules to Analyse Bio-medical data: A Comparison between C4.5 and PCL. WAIM. 2003. [View Context]. Manoranjan Dash and Huan Liu. Hybrid Search of Feature Subsets. PRICAI. 1998. [View Context]. Glenn Fung and Sathyakama Sandilya and R. Bharat Rao. Rule extraction from Linear Support Vector Machines. Computer-Aided Diagnosis & Therapy, Siemens Medical Solutions, Inc. [View Context]. Citation Request: Please refer to the Machine Learning Repository's citation policy Data was published in : Hong, Z.Q. and Yang, J.Y. "Optimal Discriminant Plane for a Small Number of Samples and Design Method of Classifier on the Plane", Pattern Recognition, Vol. 24, No. 4, pp. 317-324, 1991. Donor: Stefan Aeberhard, stefan '@' coral.cs.jcu.edu.au

数据集信息: 本数据集曾被Hong与Young用于展示最优判别平面(Optimal Discriminant Plane)在不适定场景下的性能表现。在该平面上应用K近邻(K-Nearest Neighbors, KNN)方法可获得77%的分类精度,但该结果存在显著偏倚(详见上文Aeberhard的第二篇参考文献,或致邮至stefan@coral.cs.jcu.edu.au咨询)。Aeberhard等人所得实验结果如下:正则判别分析(Regularized Discriminant Analysis, RDA)为62.5%,KNN为53.1%,最优判别平面为59.4%。本数据集涵盖3种病理性肺癌类型,原作者未提供各变量的相关细节,也未提及该数据集最初的应用场景。 备注: - 原始数据中第五个属性存在4个取值为-1的样本,现已将其修改为?(未知值)。(*) - 原始数据中第39个属性存在1个取值为4的样本,现已将其修改为?(未知值)。(*) 属性信息: 属性1为类别标签。所有预测属性均为名义型变量,取值范围为整数0至3。 相关文献: 1. Hong, Z.Q. 与 Yang, J.Y.,《小样本场景下的最优判别平面及平面上的分类器设计方法》,《模式识别》(Pattern Recognition),第24卷第4期,第317-324页,1991年。[网页链接] 2. Aeberhard, S., Coomans, D, De Vel, O.,《高维场景下分类方法的比较》,已投稿至《技术计量学》(Technometrics)。 3. Aeberhard, S., Coomans, D, De Vel, O.,《高维场景下偏倚的危害》,已投稿至《模式识别》(Pattern Recognition)。 引用本数据集的文献: 1. Jinyan Li 与 Limsoon Wong. 《基于规则分析生物医学数据:C4.5与PCL的比较》,WAIM,2003年。[查看上下文] 2. Manoranjan Dash 与 Huan Liu. 《特征子集的混合搜索》,PRICAI,1998年。[查看上下文] 3. Glenn Fung、Sathyakama Sandilya 与 R. Bharat Rao. 《从线性支持向量机中提取规则》,《计算机辅助诊断与治疗》,西门子医疗解决方案公司(Siemens Medical Solutions, Inc.)。[查看上下文] 引用要求: 请遵循机器学习存储库(Machine Learning Repository)的引用规范。 数据集发表来源: 本数据集发表于:Hong, Z.Q. 与 Yang, J.Y.,《小样本场景下的最优判别平面及平面上的分类器设计方法》,《模式识别》(Pattern Recognition),第24卷第4期,第317-324页,1991年。 提供者:Stefan Aeberhard,邮箱地址:stefan@coral.cs.jcu.edu.au
提供机构:
帕依提提
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该肺癌数据集用于研究三种病理性肺癌类型,包含分类标签和名义预测属性(取值0-3)。不同分类方法在该数据集上的准确率存在差异,如KNN方法在最优判别平面下的准确率为77%,但结果存在偏差。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务