five

Balancing performance and environmental efficiency: a multiclass classification study of textual data

收藏
DataCite Commons2025-11-17 更新2026-05-07 收录
下载链接:
http://siba-ese.unisalento.it/index.php/ejasa/article/view/29391/25984
下载链接
链接失效反馈
官方服务:
资源简介:
This study evaluates Multiclass classification (MCC) strategies -- One-Vs-Rest (OVA), One-Vs-One (OVO), Best-of-Best (BOB), and Error-Correcting-Output-Codes (ECOC) -- using classifiers like Naïve Bayes, Random Forest, Linear Discriminant Analysis, Logistic Regression, Neural Networks, Support Vector Machine, and Threshold-based Naïve Bayes on the 20NewsGroup text dataset, well known in literature for its complexity. Findings shows that the choice of classifier significantly affects accuracy and computational effort. Threshold-based Naïve Bayes excels with OVO, OVA, and BOB but declines with ECOC. Artificial Neural Network and Random Forest, which are slowest, align well with BOB and OVA respectively. In contrast, Naïve Bayes and Logistic Regression stand out for speed, particularly with OVA. Along with the Support Vector Machine, these classifiers demonstrate versatility across all strategies, balancing accuracy and training time. Additionally, OVO and BOB prove to be advantageous for handling unbalanced data, by focusing on individual class pairings. OVA emerges as the fastest strategy, while ECOC's performance is classifier-dependent. Our analysis underscores the importance of selecting the appropriate classifier and strategy pairing in MCC tasks, particularly in imbalanced datasets. Importantly, this study underlines the environmental impact of computational choices, advocating for efficient, accurate predictions to minimize energy consumption and optimize machine learning applications' ecological footprint.
提供机构:
University of Salento
创建时间:
2025-11-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作