hazyresearch/evaporate

Name: hazyresearch/evaporate
Creator: hazyresearch
Published: 2024-02-20 00:04:36
License: 暂无描述

Hugging Face2024-02-20 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/hazyresearch/evaporate

下载链接

链接失效反馈

官方服务：

资源简介：

# Evaporate Datasets for paper "Evaporate: Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes". The best way to use this data is by cloning: ``` git lfs install git clone https://huggingface.co/datasets/hazyresearch/evaporate ``` We can then unzip everything using this code snippet: ``` import os data_path = "evaporate/data" # list paths in data_path data_path_files = os.listdir(data_path) for path in data_path_files: sub_path = f"{data_path}/{path}" # tar unzip 'docs.tar.gz' in the sub_paths if os.path.exists(f"{sub_path}/docs.tar.gz"): os.system(f"tar -xvf {sub_path}/docs.tar.gz -C {sub_path}") ```

提供机构：

hazyresearch

原始信息汇总

数据集概述

数据集名称

名称: Evaporate

数据集用途

用途: 用于论文 "Evaporate: Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes" 的研究。

数据集获取方式

获取方式: 通过Git LFS克隆数据集，命令如下：

git lfs install git clone https://huggingface.co/datasets/hazyresearch/evaporate

数据集解压方法

解压方法: 使用Python脚本解压数据集中的docs.tar.gz文件，脚本示例如下： python import os

data_path = "evaporate/data"

列出data_path中的文件路径

data_path_files = os.listdir(data_path) for path in data_path_files: sub_path = f"{data_path}/{path}" # 解压子路径中的docs.tar.gz文件 if os.path.exists(f"{sub_path}/docs.tar.gz"): os.system(f"tar -xvf {sub_path}/docs.tar.gz -C {sub_path}")

5,000+

优质数据集

54 个

任务类型

进入经典数据集