five

Feature selection and molecular classification of cancer phenotypes: a comparative study

收藏
DataCite Commons2022-08-09 更新2024-07-13 收录
下载链接:
http://researchdata.cab.unipd.it/id/eprint/679
下载链接
链接失效反馈
官方服务:
资源简介:
Classification of high dimensional gene expression data is key to the development of effective di-agnostic and prognostic tools. Feature selection involves finding the best subset with the highest power in predicting class labels. We here conducted a comparative study focused on different combinations of feature selectors (Chi-Squared, mRMR, Relief-F, Genetic Algorithms) and classi-fication learning algorithms (Random Forests, PLS-DA, SVM, Regularized Logistic/Multinomial Regression, kNN) to identify those with the best predictive capacity. The performance of each combination is evaluated through an empirical study on three benchmark cancer-related micro-array datasets. Our results first suggest that the quality of the data relevant to the target classes is key for the successful classification of cancer phenotypes. We also proved that, for a given classi-fication learning algorithm and dataset, all filters have a similar performance. Interestingly, fil-ters achieve comparable or even better results with respect to the GA-based wrappers, while also being easier to implement and faster. Taken together, our findings suggest that simple, well-established feature selectors in combination with optimized classifiers guarantee good per-formances, with no need for complicated and computationally demanding methodologies
提供机构:
Centro di Ateneo per le Biblioteche dell'Università degli Studi di Padova
创建时间:
2022-08-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作