肝脏疾病数据集

Name: 肝脏疾病数据集
Creator: 帕依提提
License: 暂无描述

帕依提提2024-03-04 收录

下载链接：

https://www.payititi.com/opendatasets/show-26135.html

下载链接

链接失效反馈

官方服务：

资源简介：

前5个变量都是血液检测，被认为对过度饮酒可能引起的肝脏疾病敏感。数据集中的每一行构成一个男性个体的记录。重要提示：第7个字段（选择器）在过去被广泛误解为代表是否存在肝脏疾病的因变量。这是不正确的[1]。第七个字段由BUPA研究人员创建，作为列车/测试选择器。它不适合作为分类的因变量。数据集不包含任何表示是否存在肝脏疾病的变量。希望使用该数据集作为分类基准的研究人员应遵循捐赠者在实验中使用的方法（Forsyth&Rada，1986，《机器学习：专家系统和信息检索中的应用》）和其他方法（例如，Turney，1995，《成本敏感分类：混合遗传决策树归纳算法的经验评估》），他在二分法后使用第六个字段（饮料）作为分类的因变量。由于过去普遍存在误解，研究人员应注意清楚地说明其方法。 Attribute Information: 1.平均红细胞体积 2.碱性磷酸酶 3.谷丙转氨酶 4.门冬氨酸转氨酶 5.γ-谷氨酰转肽酶 6.饮料每天饮用半品脱酒精饮料的数量 7.BUPA研究人员创建的选择器字段，用于将数据拆分为训练集/测试集 Relevant Papers: McDermott & Forsyth 2016, Diagnosing a disorder in a classification benchmark, Pattern Recognition Letters, Volume 73. Papers That Cite This Data Set1: Zhi-Hua Zhou and Yuan Jiang. NeC4.5: Neural Ensemble based C4.5. IEEE Trans. Knowl. Data Eng, 16. 2004. [View Context]. Yuan Jiang and Zhi-Hua Zhou. Editing Training Data for kNN Classifiers with Neural Network Ensemble. ISNN (1). 2004. [View Context]. Glenn Fung and M. Murat Dundar and Jinbo Bi and Bharat Rao. A fast iterative algorithm for fisher discriminant using heterogeneous kernels. ICML. 2004. [View Context]. Jochen Garcke and Michael Griebel. Classification with sparse grids using simplicial basis functions. Intell. Data Anal, 6. 2002. [View Context]. Michail Vlachos and Carlotta Domeniconi and Dimitrios Gunopulos and George Kollios and Nick Koudas. Non-linear dimensionality reduction techniques for classification and visualization. KDD. 2002. [View Context]. Xavier Llor and David E. Goldberg and Ivan Traus and Ester Bernad i Mansilla. Accuracy, Parsimony, and Generality in Evolutionary Learning Systems via Multiobjective Selection. IWLCS. 2002. [View Context]. Jochen Garcke and Michael Griebel. Data mining with sparse grids using simplicial basis functions. KDD. 2001. [View Context]. Jochen Garcke and Michael Griebel and Michael Thess. Data Mining with Sparse Grids. Computing, 67. 2001. [View Context]. Petri Kontkanen and Jussi Lahtinen and Petri Myllym?ki and Henry Tirri. Unsupervised Bayesian visualization of high-dimensional data. KDD. 2000. [View Context]. Carlotta Domeniconi and Jing Peng and Dimitrios Gunopulos. An Adaptive Metric Machine for Pattern Classification. NIPS. 2000. [View Context]. I?aki Inza and Pedro Larra?aga and Basilio Sierra and Ramon Etxeberria and Jose Antonio Lozano and Jos Manuel Pe?a. Representing the behaviour of supervised classification learning algorithms by Bayesian networks. Pattern Recognition Letters, 20. 1999. [View Context]. Guido Lindner and Rudi Studer. AST: Support for Algorithm Selection with a CBR Approach. PKDD. 1999. [View Context]. Kristin P. Bennett and Erin J. Bredensteiner. A Parametric Optimization Method for Machine Learning. INFORMS Journal on Computing, 9. 1997. [View Context]. Jennifer A. Blue and Kristin P. Bennett. Hybrid Extreme Point Tabu Search. Department of Mathematical Sciences Rensselaer Polytechnic Institute. 1996. [View Context]. Peter D. Turney. Cost-Sensitive Classification: Empirical evaluation of a Hybrid Genetic Decision Tree Induction Algorithm. CoRR, csAI/9503102. 1995. [View Context]. Gabor Melli. A Lazy Model-based Approach to On-Line Classification. University of British Columbia. 1989. [View Context]. Aynur Akku and H. Altay Guvenir. Weighting Features in k Nearest Neighbor Classification on Feature Projections. Department of Computer Engineering and Information Science Bilkent University. [View Context]. Greg Ridgeway. The State of Boosting. Department of Statistics University of Washington. [View Context]. Creators: BUPA Medical Research Ltd. Donor: Richard S. Forsyth 8 Grosvenor Avenue Mapperley Park Nottingham NG3 5DX 0602-621676

提供机构：

帕依提提

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个针对男性个体肝脏疾病研究的医学分类数据集，包含5个与饮酒相关的血液检测指标和每日酒精饮料摄入量数据。需要特别注意：数据集不包含直接的疾病标签变量，第七个字段仅为训练/测试选择器，研究人员需使用第六个字段（饮料摄入量）作为分类依据。

以上内容由遇见数据集搜集并总结生成