five

UIC GII ML SMOTE

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/uic-gii-ml-smote
下载链接
链接失效反馈
官方服务:
资源简介:
University-industry collaborations (UIC) drive innovation and technology transfer. However, predicting the performance of these partnerships remains a persistent methodological challenge. Traditional statistical techniques are insufficient in capturing non-linear relationships in datasets. While machine learning (ML) models are progressively applied in UIC research, their implementation in Africa is limited. This paper employs ML to classify UIC performance across 32 African countries into three categories (weak, moderate and strong), using panel data from the Global Innovation Index (2013\u20132022). Three key indicators are analyzed: (1) institutional factors, (2) infrastructure, and (3) human capital & research factors. Additionally, the study identifies information and communication technology, research and development, business environment, regulatory environment and tertiary education as critical UIC success factors in Africa. These insights provide policymakers, universities, and industry stakeholders with actionable strategies to strengthen collaborative outcomes and enhance Africa\u2019s global competitiveness. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) is applied at varying ratios (100%\u2013600%). A comparative evaluation of Random Forest (RF), K-Nearest Neighbors (KNN), Neural Networks (NN), and J48 models demonstrates that the RF model enhanced with SMOTE at 500% (RF-SMOTE500) achieves the highest classification accuracy (96.3%), outperforming other SMOTE-augmented models (KNN-SMOTE600: 93.6%, J48-SMOTE500: 90.2%, NN: 80.4%) and baseline models without SMOTE (RF: 88.7%, NN: 86.1%, KNN: 84.7%, J48: 83.1%). The findings indicate that SMOTE significantly improves RF, J48 and KNN models, highlighting the importance of data balancing. However, the decline in NN performance after SMOTE, suggests that SMOTE application requires caution, particularly with neural networks.

产学合作(University-industry collaborations, UIC)是创新与技术转移的核心驱动力。然而,预测此类合作的绩效始终是一项长期存在的方法论难题。传统统计方法难以捕捉数据集中的非线性关联。尽管机器学习(Machine Learning, ML)模型正逐步应用于产学合作研究,但在非洲的落地应用仍较为有限。本研究借助机器学习模型,基于2013至2022年的全球创新指数(Global Innovation Index)面板数据,将32个非洲国家的产学合作绩效划分为弱、中、强三类。本研究分析了三类核心指标:(1) 制度因素;(2) 基础设施;(3) 人力资本与科研因素。此外,本研究确定了非洲产学合作成功的关键影响因素,包括信息与通信技术、研发投入、营商环境、监管环境以及高等教育。这些研究结论可为政策制定者、高校及产业利益相关方提供可落地的策略参考,助力强化合作成效,提升非洲的全球竞争力。为解决类别不平衡问题,本研究采用合成少数类过采样技术(Synthetic Minority Over-sampling Technique, SMOTE),并设置100%至600%的不同采样比例。针对随机森林(Random Forest, RF)、K近邻(K-Nearest Neighbors, KNN)、神经网络(Neural Networks, NN)以及J48模型的对比评估结果显示:经500%采样比例的SMOTE增强后的随机森林模型(RF-SMOTE500)分类准确率最高,达96.3%,优于其他经SMOTE增强的模型(KNN-SMOTE600:93.6%、J48-SMOTE500:90.2%、NN:80.4%)以及未使用SMOTE的基准模型(RF:88.7%、NN:86.1%、KNN:84.7%、J48:83.1%)。研究结果表明,SMOTE可显著提升RF、J48及KNN模型的性能,凸显了数据平衡的重要性。但经SMOTE处理后NN模型的性能出现下降,这提示在应用SMOTE时需谨慎,尤其是针对神经网络模型。
提供机构:
Ezenwa Nwanesi
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作