DLBench
收藏arXiv2021-10-04 更新2024-06-21 收录
下载链接:
https://github.com/Pegdwende44/DLBench
下载链接
链接失效反馈官方服务:
资源简介:
DLBench是由里昂大学创建的一个数据湖基准,包含50,000篇科学文章和5,000个表格文件,总数据量约62GB。数据来源于HAL和加拿大政府开放数据,通过特定的脚本进行数据提取和整合。该数据集主要用于评估和比较支持文本和/或表格内容的数据湖实现,解决数据湖系统评估和比较的标准化问题。
DLBench is a data lake benchmark developed by the University of Lyon. It contains 50,000 scientific articles and 5,000 tabular files, with a total data volume of approximately 62 GB. The dataset is sourced from HAL and the open data of the Government of Canada, and the data is extracted and integrated via specialized scripts. This dataset is primarily used to evaluate and compare data lake implementations that support text and/or tabular content, addressing the standardization issue in the evaluation and comparison of data lake systems.
提供机构:
里昂大学
创建时间:
2021-10-04



