five

Additional file 1 of DBCSMOTE: a clustering-based oversampling technique for data-imbalanced warfarin dose prediction

收藏
Figshare2020-10-22 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Additional_file_1_of_DBCSMOTE_a_clustering-based_oversampling_technique_for_data-imbalanced_warfarin_dose_prediction/13126607
下载链接
链接失效反馈
官方服务:
资源简介:
Additional file 1. DBCSMOTE.zip, code files for generating minority and majority clusters in Matlab. DBCSMOTE_demo.m: the demo of DBCSMOTE together with random forest, which gives the estimated dosage. ‘num’ indicates the number of iterations of running DBCSMOTE. In each iteration, ‘evaluatePop’ calls the function to evaluate the oversampling quality. ‘train.txt’, ‘validate.txt’ and ‘test.txt’ are sub sets used for training, validation and testing. DBSCAN_fun.m: the function of algorithm DBSCAN. It conducts the clustering with two parameters (Eps and MinPts) on an input dataset and returns the samples of minority clusters and the number of clusters. RandomForest.m: the function of random forest. Random forest is an ensemble model of CARTs, which are the weak regression models. They are built on the extended training set, which is extended by DBCSMOTE. CARTprediction.m: the function of CART algorithm. This is a weak regression model of random forest. Meanwhile, this is the tool for evaluating the oversampling quality, which is generated by DBCSMOTE.

附加文件1:DBCSMOTE.zip,用于在Matlab中生成少数类簇与多数类簇的代码文件。DBCSMOTE_demo.m:DBCSMOTE与随机森林(Random Forest)结合的演示脚本,可输出预估剂量。参数‘num’表示运行DBCSMOTE的迭代次数。每次迭代中,函数‘evaluatePop’将调用对应功能以评估过采样质量。‘train.txt’、‘validate.txt’与‘test.txt’分别为用于训练、验证与测试的子集数据集。DBSCAN_fun.m:DBSCAN(密度基于空间聚类的噪声应用算法,Density-Based Spatial Clustering of Applications with Noise)算法的实现函数,可基于输入数据集结合两个参数(Eps与MinPts)完成聚类任务,并返回少数类簇的样本与簇的数量。RandomForest.m:随机森林的实现函数,随机森林是由分类与回归树(Classification and Regression Tree, CART)构成的集成模型,其中CART为弱回归模型,该模型基于经DBCSMOTE扩充后的训练集构建。CARTprediction.m:CART算法的实现函数,作为随机森林的弱回归模型组件;同时该工具可用于评估由DBCSMOTE生成的过采样质量。
创建时间:
2020-10-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作