five

Imbalanced Data

收藏
DataCite Commons2023-08-23 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/imbalanced-data-0
下载链接
链接失效反馈
官方服务:
资源简介:
Classification learning on non-stationary data may face dynamic changes from time to time. The major problem in it is the class imbalance and high cost of labeling instances despite drifts. Imbalance is due to lower number of samples in the minority class than the majority class. Imbalanced data results in the misclassification of data points. This paper proposes a technique for rebalancing data with an oversampling approach using imputation methods and Hybrid Firefly Optimisation algorithm as a novel classifier to perform classification.Electricity dataset includes attributes related to power consumption with targets as electricity up or down.Imputation methods improve the number of minority samples on a data chunk. Firefly algorithm is optimised as a classification technique with tuned weights using boosting ensemble classifiers. The proposed system is tested on seven synthetic data and five data stream generators. The evaluation metrics like F-measure, AUC and G-mean are analyzed to investigate the performance 

非平稳数据上的分类学习时常面临动态变化问题。该任务面临的核心难题在于:即便存在数据漂移,仍存在类别不平衡以及实例标注成本高昂的问题。类别不平衡的根源在于少数类样本数量远少于多数类,而不平衡数据会导致数据点被误分类。本文提出一种基于插补法的过采样策略实现数据平衡,并将混合萤火虫优化算法(Hybrid Firefly Optimisation Algorithm)作为新型分类器开展分类学习。电力数据集包含与耗电量相关的属性,其标签为电价上涨或下跌两类。插补法可提升数据分块中少数类样本的数量。本文通过提升集成分类器对权重进行调优,将萤火虫算法优化为一种分类学习方法。本文所提方法在7个合成数据集与5个数据流生成器上进行了测试,并通过F值(F-measure)、曲线下面积(AUC)以及G均值(G-mean)等评估指标对模型性能进行了分析验证。
提供机构:
IEEE DataPort
创建时间:
2023-08-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作