five

Gaussian Blobs of Varying numbers of samples, centers and features

收藏
ieee-dataport.org2025-03-23 收录
下载链接:
https://ieee-dataport.org/open-access/gaussian-blobs-varying-numbers-samples-centers-and-features
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset has Gaussian Blobs of varying samples, centers and features. The number of samples ranges from 500 to 50,000. Similarly, the number of centers varies from 2 to 100, while the number of features varies from 2 to 2048. These different sets of Gaussian blobs can be used for testing clustering algorithms for their scalability and effectiveness. There are two kinds of files inside the compressed sets. Files ending with "_X.csv" consist of datapoints, while the files ending with "_y.csv" represent respective class data.The filename of each gaussian blob inside compressed sets gives a sketch of the blob. For example, the file "s50000_c50_f2048_X.csv" contains 50,000 samples of data that have 2048 dimensions (features) with 50 centers, and the file "s50000_c50_f2048_y.csv" is the associated class data of the file "s50000_c50_f2048_X.csv". The blob files are organized based on their number of samples. For example, the compressed file "10,000 datapoints set.zip" contains a collection of Gaussian blobs with 10,000 samples of data with a varying number of centers and features. The documentation section has PDF document that provides list of files inside each compressed file.The naming convention of the files uses following alphabets that represent the content of the repective file. s represents number of samplesc represents number of centersf represents number of features

该数据集包含具有不同样本数量、中心点和特征的高斯云团。样本数量介于500至50,000之间。类似地,中心点的数量介于2至100个之间,而特征的数目则介于2至2048个之间。这些不同类型的高斯云团可用于测试聚类算法的扩展性和有效性。压缩数据集中包含两种类型的文件。以“_X.csv”结尾的文件包含数据点,而以“_y.csv”结尾的文件则代表相应的类别数据。压缩集中每个高斯云团的文件名均对该云团进行了一定的描绘。例如,文件“s50000_c50_f2048_X.csv”包含50,000个样本的数据,这些数据具有2048个维度(特征)和50个中心点,而文件“s50000_c50_f2048_y.csv”则是与“s50000_c50_f2048_X.csv”文件相对应的类别数据。云团文件按照样本数量进行组织。例如,名为“10,000个数据点集合.zip”的压缩文件包含了一组具有10,000个样本数据的高斯云团,这些云团具有不同数量的中心点和特征。文档部分包含PDF文档,其中列出了每个压缩文件内的文件列表。文件的命名规范采用了以下字母来表示文件内容:s代表样本数量,c代表中心点数量,f代表特征数量。
提供机构:
IEEE Dataport
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作