Data lakes for clustering
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/js8df95fzc
下载链接
链接失效反馈官方服务:
资源简介:
This dataset describes the on-line materials that accompany article "RÓMULO: A Clustering Proposal in the Context of Data Lakes", by Patricia Jiménez, Juan C. Roldán, and Rafael Corchuelo.
The materials are organised into the following folders:
- "data-lakes": each subfolder corresponds to a data lake, and each CSV file inside a data-lake corresponds to a dataset. The last column of the datasets is called "clazz", but it is set to "0" in all cases. A few of the original datasets had a class, but it was removed to ensure that neither RóMULO nor the other competitors use it since they all are unsupervised proposals.
- "results": it provides the results of testing RóMULO and other competitors on the previous data lakes. The results consist of several "*-results.csv" files that provide effectiveness and efficiency results for each proposal used in the experimentation.
- "system": it provides the python code required to run and test RóMULO. There is a "launch.cmd" script that launches the experimentation.
COMPETITORS
-------------------
The implementation of AffinityPropagation, Meanshift, and OPTICS-XI is available in SckitLearn. The implementation of GSPPCA is available from the authors at https://github.com/pamattei/GSPPCA. THe implementation of PQC is available from the authors at https://github.com/racaes/PQC.
创建时间:
2021-02-08



