five

Results analysis using Random Forest.

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Results_analysis_using_Random_Forest_/29089377
下载链接
链接失效反馈
官方服务:
资源简介:
Leukemia is a serious problem affecting both children and adults, leading to death if left untreated. Leukemia is a kind of blood cancer described by the rapid proliferation of abnormal blood cells. An early, trustworthy, and precise identification of leukemia is important to treating and saving patients’ lives. Acute and myelogenous lymphocytic, chronic and myelogenous leukemia are the four kinds of leukemia. Manual inspection of microscopic images is frequently used to identify these malignant growth cells. Leukemia symptoms include fatigue, a lack of enthusiasm, a dull appearance, recurring illnesses, and easy blood loss. Identifying subtypes of leukemia for specialized therapy is one of the hurdles in this area. The suggested work predicts and classifies leukemia subtypes in gene data CuMiDa (GSE9476) using feature selection and ML techniques. The Curated Microarray Database (CuMiDa) collected 64 samples representing five classes of leukemia genes out of 22283 genes. The proposed approach utilizes the 25 most differentiating selected features for classification using machine and deep learning techniques. This study has a classification accuracy of 96.15% using Random Fores, 92.30 using Linear Regression, 96.15% using SVM, and 100% using LSTM. Deep learning methods have been shown to outperform traditional methods in leukemia gene classification by utilizing specific features.

白血病(Leukemia)是严重威胁儿童与成人健康的恶性疾病,若未及时接受治疗可导致死亡。白血病是一类以异常血细胞快速增殖为特征的血液恶性肿瘤。早期、可靠且精准的白血病诊断,对于患者的治疗与生命挽救至关重要。白血病主要分为四类,分别为急性淋巴细胞白血病、急性髓系白血病、慢性淋巴细胞白血病与慢性髓系白血病。临床通常采用人工镜检显微镜图像的方式,识别这类恶性增殖细胞。白血病的临床表现包括乏力、精神萎靡、面色晦暗、反复感染以及易出血症状。针对定制化治疗方案精准识别白血病亚型,是该领域面临的核心挑战之一。本研究采用特征选择与机器学习(Machine Learning, ML)技术,针对基因数据集CuMiDa(GSE9476)中的白血病亚型开展预测与分类任务。精选微阵列数据库(Curated Microarray Database, CuMiDa)从总计22283个基因中,选取了涵盖5类白血病基因的64个样本。本研究所提方法选取了25个最具区分度的特征,结合机器学习与深度学习技术完成分类任务。本研究中,随机森林(Random Forest)的分类准确率达96.15%,线性回归(Linear Regression)为92.30%,支持向量机(Support Vector Machine, SVM)为96.15%,长短期记忆网络(Long Short-Term Memory, LSTM)则达到了100%。实验结果表明,深度学习方法通过利用特异性特征,在白血病基因分类任务中的表现优于传统机器学习方法。
创建时间:
2025-05-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作