five

Derived features and their formulas.

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Derived_features_and_their_formulas_/29439111
下载链接
链接失效反馈
官方服务:
资源简介:
Predicting student performance is crucial for providing personalized support and enhancing academic performance. Advanced machine-learning approaches are being used to understand student performance variables as educational data grows. A big dataset from several Chinese institutions and high schools is used to develop a credible student performance prediction technique. Moreover, the dataset includes 80 features and 200,000 records, and consequently, it represents one of the most extensive data collections available for educational research. Initially, data is passed through preprocessing to address outliers and missing values. In addition, we developed a novel hybrid feature selection model that combined correlation filtering with mutual information, Cross-Validation (CV) along with Recursive Feature Eliminatio (RFE) (R, and stability selection to identify the most impactful features. Moreover, This study develops the proposed EffiXNet, a more refined version of EfficientNet augmented with self-attention mechanisms, dynamic convolutions, improved normalization methods, and Sparrow Search Optimization Algorithm for hyperparameter optimization. The developed model was tested using an 80/20 train-test split, where 160,000 records were used for training and 40,000 for testing. The results reported, including accuracy, precision, recall, and F1-score, are based on the full test dataset. However, for better visualization, the confusion matrices display only a representative subset of test results. Furthermore, the EffiXNet value of AUC amounting to 0.99, a 25% reduction of logarithmic loss relative to the baseline models, precision of 97.8%, F1-score of 98.1%, and reliable optimization of memory usage. Significantly, the developed model showed a consistently high-performance level demonstrated by various metrics, which indicates that it is proficient in capturing intricate data patterns. The key insights the current research provides are the necessity of early intervention and directed training support in the educational domain. The EffiXNet framework offers a robust, scalable, and efficient solution for predicting student performance, with potential applications in academic institutions worldwide.

预测学生学业表现,对于提供个性化支持、提升整体教学质量与学生成绩至关重要。随着教育数据规模的持续扩张,先进机器学习(Machine Learning)方法正被广泛用于解析影响学生表现的各类变量。本研究采用来自多所中国院校与中学的大规模数据集,以构建可靠的学生表现预测模型。该数据集共包含80个特征与200000条记录,是当前教育研究领域可用的规模最大的教育数据集之一。研究伊始,我们首先对原始数据进行预处理,以处理异常值与缺失值。此外,我们提出了一种新型混合特征选择模型,该模型将相关性过滤与互信息、交叉验证(Cross-Validation,CV)结合递归特征消除(Recursive Feature Elimination,RFE)以及稳定性选择相结合,以识别最具影响力的特征。本研究还设计了改进版模型EffiXNet,该模型是在EfficientNet基础上优化得到的,融入了自注意力机制、动态卷积、优化后的归一化方法,并采用麻雀搜索优化算法(Sparrow Search Optimization Algorithm)进行超参数调优。所提模型采用80/20的训练集-测试集划分方式进行验证,即使用160000条记录用于模型训练,剩余40000条用于测试。报告的实验结果包括准确率、精确率、召回率以及F1分数,均基于完整的测试数据集。为便于可视化展示,混淆矩阵仅展示了具有代表性的部分测试结果。实验结果显示,EffiXNet的受试者工作特征曲线下面积(Area Under Curve,AUC)可达0.99,相较于基线模型,对数损失降低了25%,精确率达97.8%,F1分数为98.1%,同时实现了内存占用的高效优化。值得注意的是,所提模型在各项评价指标上均表现出持续的高性能,表明其能够有效捕捉复杂的数据模式。本研究的核心启示之一是,在教育领域开展早期干预与定向训练支持具有必要性。EffiXNet框架为学生表现预测提供了一种鲁棒、可扩展且高效的解决方案,有望在全球范围内的学术机构中得到应用。
创建时间:
2025-06-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作