Big Data Machine Learning Benchmark on Spark
收藏IEEE2019-06-06 更新2026-04-17 收录
下载链接:
https://ieee-dataport.org/open-access/big-data-machine-learning-benchmark-spark
下载链接
链接失效反馈官方服务:
资源简介:
We introduce a benchmark of distributed algorithms execution over big data. The datasets are composed of metrics about the computational impact (resource usage) of eleven well-known machine learning techniques on a real computational cluster regarding system resource agnostic indicators: CPU consumption, memory usage, operating system processes load, net traffic, and I/O operations. The metrics were collected every five seconds for each algorithm on five different data volume scales, totaling 275 distinct datasets. The tested scenarios embraced problems of regression, clustering, classification, dimensionality reduction, and collaborative filtering. We performed experiments on 2.15 TB of synthetic data produced with Intel HiBench, in a cluster composed of 128 cores and 848 GB RAM managed by Apache Spark framework. We hope these datasets can be used by the scientific community to obtain insights about running algorithms on big data processing platforms.
提供机构:
Federal University of Pernambuco
创建时间:
2019-06-06



