five

Big Data Machine Learning Benchmark on Spark

收藏
IEEE2019-06-06 更新2026-04-17 收录
下载链接:
https://ieee-dataport.org/open-access/big-data-machine-learning-benchmark-spark
下载链接
链接失效反馈
官方服务:
资源简介:
We introduce a benchmark of distributed algorithms execution over big data. The datasets are composed of metrics about the computational impact (resource usage) of eleven well-known machine learning techniques on a real computational cluster regarding system resource agnostic indicators: CPU consumption, memory usage, operating system processes load, net traffic, and I/O operations. The metrics were collected every five seconds for each algorithm on five different data volume scales, totaling 275 distinct datasets. The tested scenarios embraced problems of regression, clustering, classification, dimensionality reduction, and collaborative filtering. We performed experiments on 2.15 TB of synthetic data produced with Intel HiBench, in a cluster composed of 128 cores and 848 GB RAM managed by Apache Spark framework. We hope these datasets can be used by the scientific community to obtain insights about running algorithms on big data processing platforms.
提供机构:
Federal University of Pernambuco
创建时间:
2019-06-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作