A virtual multi-label approach to imbalanced data classification

Name: A virtual multi-label approach to imbalanced data classification
Creator: Chou, Elizabeth P.; Yang, Shan-Ping
Published: 2024-02-28 00:00:00
License: 暂无描述

Taylor & Francis Group2024-02-28 更新2026-04-16 收录

下载链接：

https://tandf.figshare.com/articles/dataset/A_virtual_multi-label_approach_to_imbalanced_data_classification/19390561/1

下载链接

链接失效反馈

官方服务：

资源简介：

One of the most challenging issues in machine learning is imbalanced data analysis. Usually, in this type of research, correctly predicting minority labels is more critical than correctly predicting majority labels. However, traditional machine learning techniques easily lead to learning bias. Traditional classifiers tend to place all subjects in the majority group, resulting in biased predictions. Machine learning studies are typically conducted from one of two perspectives: a data-based perspective or a model-based perspective. Oversampling and undersampling are examples of data-based approaches, while the addition of costs, penalties, or weights to optimize the algorithm is typical of a model-based approach. Some ensemble methods have been studied recently. These methods cause various problems, such as overfitting, the omission of some information, and long computation times. In addition, these methods do not apply to all kinds of datasets. Based on this problem, the virtual labels (ViLa) approach for the majority label is proposed to solve the imbalanced problem. A new multiclass classification approach with the equal K-means clustering method is demonstrated in the study. The proposed method is compared with commonly used imbalance problem methods, such as sampling methods (oversampling, undersampling, and SMOTE) and classifier methods (SVM and one-class SVM). The results show that the proposed method performs better when the degree of data imbalance increases and will gradually outperform other methods.

提供机构：

Chou, Elizabeth P.; Yang, Shan-Ping

创建时间：

2022-03-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集