Gaussian Mixture high-dimensional datasets
收藏doi.org2025-03-26 收录
下载链接:
http://doi.org/10.17632/xj9vyybgzn.1
下载链接
链接失效反馈官方服务:
资源简介:
Seven sets of c Gaussian shaped clustered datasets. For each dataset, n points with p dimensions were generated from a mixture of c Gaussian distributions (clusters). The means of each cluster were randomly generated with numbers between 0 and 10 and they were structured in a c x p matrix. The standard deviations of each cluster are represented by a p x p covariance matrix generated by a normally distributed random numbers. After all, from the matrix of means c x p and the c covariance matrices p x p, the mvrnorm function of the MASS library of the R software was used to produce the samples that composes the Gaussian mixture dataset.
The properties of the seven Gaussian mixture datasets:
Dataset | p | n | c
Gaussian.k8 | 657 | 81 | 8
Gaussian.k2 | 4,232 | 181 | 2
Gaussian.k7 | 4,514 | 128 | 7
Gaussian.k9 | 5,041 | 108 | 9
Gaussian.k6 | 5,176 | 143 | 6
Gaussian.k5 | 6,203 | 130 | 5
Gaussian.k4 | 6,615 | 168 | 4
The cluster label of each object is in the last column of the dataset.
Each column is separated by a comma and there are not columns and row names.
本数据集由七组高斯形状的聚类数据集构成。针对每一数据集,从由c个高斯分布(聚类)混合而成的空间中,生成了n个具有p维度的点。每个聚类的均值随机生成,其数值介于0至10之间,并以c x p矩阵的形式排列。每个聚类的标准差由一个p x p的协方差矩阵表示,该矩阵由正态分布的随机数生成。最终,通过R软件MASS库中的mvrnorm函数,利用均值矩阵c x p和c个协方差矩阵p x p,生成了构成高斯混合数据集的样本。
七组高斯混合数据集的特性如下:
数据集 | 维度p | 点数n | 聚类数c
--- | --- | --- | ---
Gaussian.k8 | 657 | 81 | 8
Gaussian.k2 | 4,232 | 181 | 2
Gaussian.k7 | 4,514 | 128 | 7
Gaussian.k9 | 5,041 | 108 | 9
Gaussian.k6 | 5,176 | 143 | 6
Gaussian.k5 | 6,203 | 130 | 5
Gaussian.k4 | 6,615 | 168 | 4
每个对象的聚类标签位于数据集的最后一列。各列之间以逗号分隔,且数据集中不包含列名和行名。
提供机构:
doi.org



