five

ALOJA

收藏
arXiv2015-11-06 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/1511.02037v1
下载链接
链接失效反馈
官方服务:
资源简介:
ALOJA数据集是由巴塞罗那超级计算中心与微软公司合作开发的一个开放、供应商中立的存储库,专注于Hadoop环境下的性能基准测试和预测分析。该数据集收集了超过40,000次Hadoop作业的执行数据,包括性能细节和配置参数,旨在通过机器学习方法自动化分析大数据部署的成本效益。数据集不仅支持硬件配置、参数和云服务的比较,还提供了一套工具来评估不同配置的成本效益。此外,ALOJA项目还开发了ALOJA-ML工具,用于从历史执行数据中学习并预测新配置的执行行为,帮助用户优化大数据应用的设计和部署,解决Hadoop环境中的性能和成本优化问题。

The ALOJA dataset is an open, vendor-neutral repository developed in collaboration between the Barcelona Supercomputing Center and Microsoft Corporation, focused on performance benchmarking and predictive analytics for Hadoop environments. This dataset collects execution data from over 40,000 Hadoop jobs, including performance details and configuration parameters, with the goal of automating cost-effectiveness analysis of big data deployments through machine learning approaches. The dataset not only enables comparisons of hardware configurations, parameters and cloud services, but also provides a comprehensive toolset for evaluating the cost-effectiveness of various configurations. Additionally, the ALOJA project has developed the ALOJA-ML tool, which leverages historical execution data to learn and predict the execution behavior of new configurations, assisting users in optimizing the design and deployment of big data applications and resolving performance and cost optimization challenges in Hadoop environments.
提供机构:
巴塞罗那超级计算中心 - 加泰罗尼亚理工大学
创建时间:
2015-11-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作