SOMOTE_EASY: AN ALGORITHM TO TREAT THE CLASSIFICATION ISSUE IN REAL DATABASES
收藏DataCite Commons2021-03-24 更新2024-07-28 收录
下载链接:
https://scielo.figshare.com/articles/dataset/SOMOTE_EASY_AN_ALGORITHM_TO_TREAT_THE_CLASSIFICATION_ISSUE_IN_REAL_DATABASES/14287861
下载链接
链接失效反馈官方服务:
资源简介:
ABSTRACT Most classification tools assume that data distribution be balanced or with similar costs, when not properly classified. Nevertheless, in practical terms, the existence of database where unbalanced classes occur is commonplace, such as in the diagnosis of diseases, in which the confirmed cases are usually rare when compared with a healthy population. Other examples are the detection of fraudulent calls and the detection of system intruders. In these cases, the improper classification of a minority class (for instance, to diagnose a person with cancer as healthy) may result in more serious consequences that incorrectly classify a majority class. Therefore, it is important to treat the database where unbalanced classes occur. This paper presents the SMOTE_Easy algorithm, which can classify data, even if there is a high level of unbalancing between different classes. In order to prove its efficiency, a comparison with the main algorithms to treat classification issues was made, where unbalanced data exist. This process was successful in nearly all tested databases
摘要 大多数分类工具均默认假设数据分布均衡或各类别分类代价相近,若未针对非均衡分类场景进行合理适配,则难以实现准确分类。然而在实际应用中,类别非均衡的数据库比比皆是:以疾病诊断场景为例,相较于健康人群,确诊病例通常占比极低。其他典型场景还包括诈骗通话检测与系统入侵检测。在此类场景中,对少数类别的错误分类(例如将癌症患者误诊为健康人)所造成的后果,远比对多数类别的错误分类更为严重。因此,针对类别非均衡数据库的处理方法研究具有重要现实意义。本文提出了SMOTE_Easy算法,该算法可有效处理各类别间高度非均衡的分类任务。为验证该算法的有效性,本文将其与当前主流的非均衡分类处理算法进行了对比实验。实验结果显示,该算法在几乎所有测试数据集上均取得了优异的分类效果。
提供机构:
SciELO journals
创建时间:
2021-03-24



