hazyresearch/evaporate
收藏Hugging Face2024-02-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/hazyresearch/evaporate
下载链接
链接失效反馈官方服务:
资源简介:
# Evaporate
Datasets for paper "Evaporate: Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes".
The best way to use this data is by cloning:
```
git lfs install
git clone https://huggingface.co/datasets/hazyresearch/evaporate
```
We can then unzip everything using this code snippet:
```
import os
data_path = "evaporate/data"
# list paths in data_path
data_path_files = os.listdir(data_path)
for path in data_path_files:
sub_path = f"{data_path}/{path}"
# tar unzip 'docs.tar.gz' in the sub_paths
if os.path.exists(f"{sub_path}/docs.tar.gz"):
os.system(f"tar -xvf {sub_path}/docs.tar.gz -C {sub_path}")
```
提供机构:
hazyresearch
原始信息汇总
数据集概述
数据集名称
- 名称: Evaporate
数据集用途
- 用途: 用于论文 "Evaporate: Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes" 的研究。
数据集获取方式
-
获取方式: 通过Git LFS克隆数据集,命令如下:
git lfs install git clone https://huggingface.co/datasets/hazyresearch/evaporate
数据集解压方法
-
解压方法: 使用Python脚本解压数据集中的
docs.tar.gz文件,脚本示例如下: python import osdata_path = "evaporate/data"
列出data_path中的文件路径
data_path_files = os.listdir(data_path) for path in data_path_files: sub_path = f"{data_path}/{path}" # 解压子路径中的docs.tar.gz文件 if os.path.exists(f"{sub_path}/docs.tar.gz"): os.system(f"tar -xvf {sub_path}/docs.tar.gz -C {sub_path}")



