five

Retrieve, Merge, Predict: Augmenting Tables with Data Lakes

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10600047
下载链接
链接失效反馈
官方服务:
资源简介:
Files composing the YADL data lake, for the paper "Retrieve, Merge, Predict: Augmenting Tables with Data Lakes (Experiment, Analysis & Benchmark Paper)" We present an in-depth analysis of data discovery for analytics in data lakes, focusing on table augmentation for given machine learning tasks. We analyze alternative methods used in the three key steps: retrieving joinable tables, merging information, and predicting with the resultant table. As data lakes, the paper uses YADL (Yet Another Data Lake) -- a novel dataset developed as a tool for benchmarking this data discovery task -- and Open Data US, a well-referenced real data lake. Through systematic exploration on both lakes, our study outlines the importance of accurately retrieving join candidates, and the efficiency of simple aggregation methods. We report new insights on the benefits of existing solutions and on the their limitations, aiming at guiding future research in this space. Archives provided here follow the notation used for the experiments, which is different from what is reported in the paper. The four YADL versions available here are: "binary_update" (YADL Binary) "wordnet_full" (YADL Base) "wordnet_vldb_10" (YADL 10k) "wordnet_vldb_50" (YADL 50k)
创建时间:
2024-07-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作