five

Data lakes for clustering

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://data.mendeley.com/datasets/js8df95fzc
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset describes the on-line materials that accompany article "RÓMULO: A Clustering Proposal in the Context of Data Lakes", by Patricia Jiménez, Juan C. Roldán, and Rafael Corchuelo. The materials are organised into the following folders: - "data-lakes": each subfolder corresponds to a data lake, and each CSV file inside a data-lake corresponds to a dataset. The last column of the datasets is called "clazz", but it is set to "0" in all cases. A few of the original datasets had a class, but it was removed to ensure that neither RóMULO nor the other competitors use it since they all are unsupervised proposals. - "results": it provides the results of testing RóMULO and other competitors on the previous data lakes. The results consist of several "*-results.csv" files that provide effectiveness and efficiency results for each proposal used in the experimentation. - "system": it provides the python code required to run and test RóMULO. There is a "launch.cmd" script that launches the experimentation. COMPETITORS ------------------- The implementation of AffinityPropagation, Meanshift, and OPTICS-XI is available in SckitLearn. The implementation of GSPPCA is available from the authors at https://github.com/pamattei/GSPPCA. THe implementation of PQC is available from the authors at https://github.com/racaes/PQC.
创建时间:
2021-02-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作