Finding Compact, Isolated Clusters in Data Lakes

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/k96yvyjb68

下载链接

链接失效反馈

官方服务：

资源简介：

These are the research materials that accompany article "On Exploring Data Lakes by Finding Compact, Isolated Clusters", by Patricia Jiménez, Juan C. Roldán, and Rafael Corchuelo. This package includes the following: - "data-lakes": it contains a compressed archive whose organisation is driven by data lakes. Each subfolder corresponds to data lake, and each CSV file inside a data-lake corresponds to a dataset. The last column of the datasets is called "clazz", but it is set to "0" in all cases. The column is present because many software packages expect that column to be present, although it is completely ignored. - "results": it provides the results of testing RóMULO and other competitors on the previous data lakes. Each CSV file in this folder provides the experimental data gathered for all of the competitors on all of the datasets using a particular performance measure. - "system": it provides the python code required to run and test RóMULO. There is a "launch.cmd" script that launches the experimentation. The implementation of the competitors can be found elsewhere. The implementation of GSPPCA is available from the authors at https://github.com/pamattei/GSPPCA. The implementation of AffinityPropagation, Meanshift, and OPTICS-XI is available from SckitLearn at https://scikit-learn.org/stable/install.html. The implementation of PQC is available from the authors at https://github.com/racaes/PQC. The implementation of DCC is also available from the authors at https://github.com/shahsohil/DCC.

创建时间：

2021-08-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集