five

The number of missing values in each risk factor.

收藏
Figshare2023-07-19 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/The_number_of_missing_values_in_each_risk_factor_/23711057
下载链接
链接失效反馈
官方服务:
资源简介:
Cancer is a broad term that refers to a wide range of diseases that can affect any part of the human body. To minimize the number of cancer deaths and to prepare an appropriate health policy on cancer spread mitigation, scientifically supported knowledge of cancer causes is critical. As a result, in this study, we analyzed lung cancer risk factors that lead to a highly severe cancer case using a decision tree-based ranking algorithm. This feature relevance ranking algorithm computes the weight of each feature of the dataset by using split points to improve detection accuracy, and each risk factor is weighted based on the number of observations that occur for it on the decision tree. Coughing of blood, air pollution, and obesity are the most severe lung cancer risk factors out of nine, with a weight of 39%, 21%, and 14%, respectively. We also proposed a machine learning model that uses Extreme Gradient Boosting (XGBoost) to detect lung cancer severity levels in lung cancer patients. We used a dataset of 1000 lung cancer patients and 465 individuals free from lung cancer from Tikur Ambesa (Black Lion) Hospital in Addis Ababa, Ethiopia, to assess the performance of the proposed model. The proposed cancer severity level detection model achieved 98.9%, 99%, and 98.9% accuracy, precision, and recall, respectively, for the testing dataset. The findings can assist governments and non-governmental organizations in making lung cancer-related policy decisions.

癌症是一类广义术语,涵盖可累及人体任何部位的多种疾病。为降低癌症致死人数,并制定科学合理的癌症扩散防控健康政策,具备循证依据的癌症病因知识至关重要。为此,本研究采用基于决策树的排序算法,对可引发重症肺癌的风险因素展开分析。该特征相关性排序算法通过引入分裂点以提升检测精度,并基于各风险因素在决策树上出现的观测次数为其赋予权重。在9项风险因素中,咯血、空气污染与肥胖为最主要的肺癌高危因素,权重分别为39%、21%与14%。本研究同时提出了一种采用极限梯度提升树(Extreme Gradient Boosting,XGBoost)的机器学习模型,用于肺癌患者的病情严重程度分级检测。本研究使用来自埃塞俄比亚亚的斯亚贝巴蒂库尔·安贝萨(黑狮)医院的数据集开展实验,该数据集包含1000名肺癌患者与465名健康对照个体,用以评估所提模型的性能。在测试集上,所提肺癌严重程度检测模型的准确率、精确率与召回率分别达到98.9%、99%与98.9%。本研究结果可为政府及非政府组织制定肺癌相关政策提供决策参考。
创建时间:
2023-07-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作