Synthetic Data for Model Ensembling
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/globusharris/ensembling-constrained-optimization
下载链接
链接失效反馈官方服务:
资源简介:
该数据集由合成数据组成,包含20个特征和4维标签,旨在评估在受限优化设置中的集成学习方法。特征数据遵循多元正态分布,并且有一个最终的分类特征,随机将数据点分配到五个类别中的一个。标签与特征之间存在一个带有噪声的线性关系。数据规模方面,训练集包含10,000个数据点,评估集包含400个数据点。该任务的目的是为了优化而进行模型集成。
This dataset consists of synthetic data with 20 features and 4-dimensional labels, designed to evaluate ensemble learning methods under constrained optimization settings. The feature data follows a multivariate normal distribution, and there is a final categorical feature that randomly assigns each data point to one of five categories. A noisy linear relationship exists between the labels and the features. In terms of dataset scale, the training set contains 10,000 data points while the evaluation set includes 400 data points. The goal of this task is to perform model ensembling for optimization purposes.
提供机构:
Generated by authors



