five

mango-ttic/data-intermediate

收藏
Hugging Face2024-05-27 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/mango-ttic/data-intermediate
下载链接
链接失效反馈
官方服务:
资源简介:
# data-intermediate If you are looking for our test ready version, please refer to [mango-ttic/data](https://huggingface.co/datasets/mango-ttic/data) Find more about us at [mango.ttic.edu](https://mango.ttic.edu) ## Folder Structure Each folder inside `data-intermediate` contains all intermediate files we used during data annotation and generation. Here is the tree structure from game `data-intermediate/night` . ```bash data-intermediate/night/ ├── night.all2all.json # all simple paths between any 2 nodes ├── night.all_pairs.json # all connectivity between any 2 nodes ├── night.anno2code.json # annotation to codename mapping ├── night.code2anno.json # codename to annotation mapping ├── night.edges.json # list of all edges ├── night.map.human # human map derived from human annotation ├── night.map.machine # machine map derived from exported action sequences ├── night.map.reversed # reverse map derived from human annotation map ├── night.moves # list of mentioned actions ├── night.nodes.json # list of all nodes ├── night.valid_moves.csv # human annotation ├── night.walkthrough # enriched walkthrough exported from Jericho simulator └── night.walkthrough_acts # action sequences exported from Jericho simulator ``` ## Variations ### 70-step vs all-step version In our paper, we benchmark using the first 70 steps of the walkthrough from each game. We also provide all-step versions of both `data` and `data-intermediate` collection. * **70-step** `data-intermediate-70steps.tar.zst`: contains the first 70 steps of each walkthrough. If the complete walkthrough is shorter than 70 steps, then all steps are used. * **All-step** `data-intermediate.tar.zst`: contains all steps of each walkthrough. ### Word-only & Word+ID * **Word-only** `data-intermediate.tar.zst`: Nodes are annotated by additional descriptive text to distinguish different locations with similar names. * **Word + Object ID** `data-intermediate-objid.tar.zst`: variation of the word-only version, where nodes are labeled using minimaly fixed names with object id from Jericho simulator. * **Word + Random ID** `data-intermediate-randid.tar.zst`: variation of the Jericho ID version, where the Jericho object id replaced with randomly generated integer. We primarily rely on the **word-only** version as benchmark, yet providing word+ID version for diverse benchmark settings. ## How to use We use `data-intermediate.tar.zst` as an example here. ### 1. download from Huggingface #### by directly download You can selectively download certain variation of your choice. ![](direct_download_data-intermediate.png) #### by git Make sure you have [git-lfs](https://git-lfs.com) installed ```bash git lfs install git clone https://huggingface.co/datasets/mango-ttic/data-intermediate # or, use hf-mirror if your connection to huggingface.co is slow # git clone https://hf-mirror.com/datasets/mango-ttic/data-intermediate ``` If you want to clone without large files - just their pointers ```bash GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/mango-ttic/data-intermediate # or, use hf-mirror if your connection to huggingface.co is slow # GIT_LFS_SKIP_SMUDGE=1 git clone https://hf-mirror.com/datasets/mango-ttic/data-intermediate ``` ### 2. decompress Because some json files are huge, we use tar.zst to package the data efficiently. silently decompress ```bash tar -I 'zstd -d' -xf data-intermediate.tar.zst ``` or, verbosely decompress ```bash zstd -d -c data-intermediate.tar.zst | tar -xvf - ```
提供机构:
mango-ttic
原始信息汇总

数据集中间版本

文件夹结构

每个 data-intermediate 文件夹包含数据标注和生成过程中使用的所有中间文件。以下是游戏 data-intermediate/night 的树结构:

bash data-intermediate/night/ ├── night.all2all.json # 任意两个节点之间的所有简单路径 ├── night.all_pairs.json # 任意两个节点之间的所有连接性 ├── night.anno2code.json # 标注到代码名称的映射 ├── night.code2anno.json # 代码名称到标注的映射 ├── night.edges.json # 所有边的列表 ├── night.map.human # 从人工标注导出的人工地图 ├── night.map.machine # 从导出的动作序列导出的机器地图 ├── night.map.reversed # 从人工标注地图导出的反向地图 ├── night.moves # 提到的动作列表 ├── night.nodes.json # 所有节点的列表 ├── night.valid_moves.csv # 人工标注 ├── night.walkthrough # 从Jericho模拟器导出的丰富攻略 └── night.walkthrough_acts # 从Jericho模拟器导出的动作序列

变体

70步与全步版本

在我们的论文中,我们使用每个游戏攻略的前70步进行基准测试。我们还提供了 datadata-intermediate 集合的全步版本。

  • 70步 data-intermediate-70steps.tar.zst:包含每个攻略的前70步。如果完整攻略少于70步,则使用所有步骤。

  • 全步 data-intermediate.tar.zst:包含每个攻略的所有步骤。

仅单词与单词+ID

  • 仅单词 data-intermediate.tar.zst:节点通过额外的描述性文本来区分名称相似的不同位置。

  • 单词 + 对象ID data-intermediate-objid.tar.zst:仅单词版本的变体,节点使用Jericho模拟器中的最小固定名称和对象ID进行标注。

  • 单词 + 随机ID data-intermediate-randid.tar.zst:Jericho ID版本的变体,其中Jericho对象ID被随机生成的整数替换。

我们主要依赖仅单词版本进行基准测试,但提供单词+ID版本以适应不同的基准测试设置。

如何使用

我们以 data-intermediate.tar.zst 为例。

1. 从Huggingface下载

直接下载

您可以选择性地下载您选择的特定变体。

通过git

确保您已安装 git-lfs

bash git lfs install git clone https://huggingface.co/datasets/mango-ttic/data-intermediate

或者,如果您的连接到huggingface.co较慢,使用hf-mirror

git clone https://hf-mirror.com/datasets/mango-ttic/data-intermediate

如果您想在不下载大文件的情况下克隆 - 仅克隆它们的指针

bash GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/mango-ttic/data-intermediate

或者,如果您的连接到huggingface.co较慢,使用hf-mirror

GIT_LFS_SKIP_SMUDGE=1 git clone https://hf-mirror.com/datasets/mango-ttic/data-intermediate

2. 解压缩

由于某些json文件很大,我们使用tar.zst高效地打包数据。

静默解压缩

bash tar -I zstd -d -xf data-intermediate.tar.zst

或者,详细解压缩

bash zstd -d -c data-intermediate.tar.zst | tar -xvf -

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作