five

GWO control parameters settings.

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/GWO_control_parameters_settings_/27090640
下载链接
链接失效反馈
官方服务:
资源简介:
DNA splice junction classification is a crucial job in computational biology. The challenge is to predict the junction type (IE, EI, or N) from a given DNA sequence. Predicting junction type is crucial for understanding gene expression patterns, disease causes, splicing regulation, and gene structure. The location of the regions where exons are joined, and introns are removed during RNA splicing is very difficult to determine because no universal rule guides this process. This study presents a two-layer hybrid approach inspired by ensemble learning to overcome this challenge. The first layer applies the grey wolf optimizer (GWO) for feature selection. GWO’s exploration ability allows it to efficiently search a vast feature space, while its exploitation ability refines promising areas, thus leading to a more reliable feature selection. The selected features are then fed into the second layer, which employs a classification model trained on the retrieved features. Using cross-validation, the proposed method divides the DNA splice junction dataset into training and test sets, allowing for a thorough examination of the classifier’s generalization ability. The ensemble model is trained on various partitions of the training set and tested on the remaining held-out fold. This process is performed for each fold, comprehensively evaluating the classifier’s performance. We tested our method using the StatLog DNA dataset. Compared to various machine learning models for DNA splice junction prediction, the proposed GWO+SVM ensemble method achieved an accuracy of 96%. This finding suggests that the proposed ensemble hybrid approach is promising for DNA splice junction classification. The implementation code for the proposed approach is available at https://github.com/EFHamouda/DNA-splice-junction-prediction.

DNA剪接位点(DNA splice junction)分类是计算生物学中的核心任务。其挑战在于从给定的DNA序列中预测剪接位点类型(IE、EI或N)。对剪接位点类型的预测,对于理解基因表达模式、致病机制、剪接调控过程以及基因结构均具有关键意义。 在RNA剪接过程中,外显子相互连接、内含子被切除的区域位置难以精准判定,因为该过程并无通用规则可循。 本研究提出一种受集成学习(ensemble learning)启发的双层混合方法,以应对上述挑战。第一层采用灰狼优化器(GWO)开展特征选择:灰狼优化器的探索能力使其可高效遍历庞大的特征空间,而其开发能力则可对高潜力区域进行精细化筛选,从而实现更可靠的特征选择。经筛选得到的特征将被输入至第二层,该层采用基于筛选后特征训练得到的分类模型。 本研究所提方法采用交叉验证(cross-validation)策略,将DNA剪接位点数据集划分为训练集与测试集,以全面评估分类器的泛化能力。集成模型将在训练集的不同划分子集上完成训练,并在剩余的留出折上进行测试。该流程将针对每一折重复执行,以全面评估分类器的整体性能。 本研究采用StatLog DNA数据集对所提方法进行了测试。相较于多款用于DNA剪接位点预测的机器学习模型,本研究提出的GWO+SVM集成方法实现了96%的分类准确率。该结果表明,所提出的集成混合方法在DNA剪接位点分类任务中具有良好的应用潜力。 本研究方法的实现代码可在以下链接获取:https://github.com/EFHamouda/DNA-splice-junction-prediction.
创建时间:
2024-09-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作