Big Data Machine Learning Benchmark on Spark

Name: Big Data Machine Learning Benchmark on Spark
Creator: Federal University of Pernambuco
Published: 2019-06-06 00:00:00
License: 暂无描述

IEEE2019-06-06 更新2026-04-17 收录

下载链接：

https://ieee-dataport.org/open-access/big-data-machine-learning-benchmark-spark

下载链接

链接失效反馈

官方服务：

资源简介：

We introduce a benchmark of distributed algorithms execution over big data. The datasets are composed of metrics about the computational impact (resource usage) of eleven well-known machine learning techniques on a real computational cluster regarding system resource agnostic indicators: CPU consumption, memory usage, operating system processes load, net traffic, and I/O operations. The metrics were collected every five seconds for each algorithm on five different data volume scales, totaling 275 distinct datasets. The tested scenarios embraced problems of regression, clustering, classification, dimensionality reduction, and collaborative filtering. We performed experiments on 2.15 TB of synthetic data produced with Intel HiBench, in a cluster composed of 128 cores and 848 GB RAM managed by Apache Spark framework. We hope these datasets can be used by the scientific community to obtain insights about running algorithms on big data processing platforms.

提供机构：

Federal University of Pernambuco

创建时间：

2019-06-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集