zdy023/WikiHow-taskset
收藏Hugging Face2024-04-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/zdy023/WikiHow-taskset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
(Works with [Mobile-Env v3.x](https://github.com/X-LANCE/Mobile-Env/tree/v3.0).)
# WikiHow Task Set
WikiHow task set is an InfoUI interaction task set based on
[Mobile-Env](https://github.com/X-LANCE/Mobile-Env) proposed in [*Mobile-Env:
An Evaluation Platform and Benchmark for Interactive Agents in LLM
Era*](https://arxiv.org/abs/2305.08144).
[WikiHow](https://www.wikihow.com/Main-Page) is a collaborative wiki site about
various real-life tips with more than 340,000 online articles. To construct the
task set, 107,448 pages are crawled, and the dumped website data occupy about
88 GiB totally.
Several task definition templates are designed according to the functions of
WikiHow app and task definitions are instantiated through the template toolkit
in Mobile-Env. 577 tasks are sampled from the extended set, which is named the
*canonical set* (`wikihow-canonical.tar.xz`). Owing to the limit of the
budgets, only 150 tasks are tested using the proposed LLM-based agent. These
150 tasks are given in `wikihow-microcanon.tar.xz`. We call it the *canonical
subset* or the *micro canonical set*.
### Website Data Replay
The replay script for [mitmproxy](https://mitmproxy.org/) is given as
`replay_url.py`. To use this replay script, the information retrieval tool
[Pyserini](https://github.com/castorini/pyserini/) is required. Four parameters
are expected to be assigned in the script:
+ The crawled data from WikiHow website (`dumps` in `wikihow.data.tar.xz`)
+ The HTML templates used to mock the search result page (`templates` in
`wikihow.data.tar.xz`)
+ The indices for the search engine based on Pyserini (`indices-t/indices` in
`wikihow.data.tar.xz`)
+ The metadata of the crawled articles (`indices-t/docs/doc_meta.csv` in
`wikihow.data.tar.xz`)
All the required data are offered in `wikihow.data.tar.xz`. (The archive is
about 78 GiB. And the decompressed data are about 88 GiB.) The archive is split
into two pieces (`wikihow.data.tar.xz.00` and `wikihow.data.tar.xz.01`). You
can use `cat` to concatenate them:
```sh
cat wikihow.data.tar.xz.00 wikihow.data.tar.xz.01 >wikihow.data.tar.xz
```
The SHA256 checksums are provided in `wikihow.data.tar.xz.sha256` to check the
integrity.
To run the script:
```sh
mitmproxy --showhost -s replay_url.py
```
### Certificate Unpinning Plan
The `syscert` plan proposed by Mobile-Env works for WikiHow app. You can
complete the config according to the [guideline of
Mobile-Env](https://github.com/X-LANCE/Mobile-Env/blob/master/docs/dynamic-app-en.md).
The available APK package from [APKCombo](https://apkcombo.com/) is provided.
And note to use the AVD image of version Android 11.0 (API Level 30) (Google
APIs) to obtain the best compatibility and the root-enabled ADBD.
### Human-Rewritten Instructions
Human-rewritten instructions for the *canonical set* are release under
`instruction_rewriting/`. An AndroidEnv wrapper `InstructionRewritingWrapper`
is provided to load the rewritten instructions (`merged_doccano.json`) and
public patterns (`pattern-*.txt`). The annotations are collected via
[doccano](https://doccano.github.io/doccano/). The patterns are parsed by
[`sentence_pattern.py`](instruction_rewriting/sentence_pattern.py).
### Details of Sub-Tasks
WikiHow taks are crafted from 16 types of sub-tasks:
* `home2search`, instructing to search for an article from the home page.
* `search2article`, `author2article`, & `category2article`, instructing to
access an article from search result page, author information page, and
category content page, respectively.
* `article2about`, instructing to access the about page from article page.
* `article2author`, instructing to access author information page from article
page.
* `article2category`, instructing to access category content page from article
page.
* `article2reference`, instructing to check reference list on article page.
* `article2rate_no`, instructing to rate no for article
* `article2rate_yes`, instructing to rate yes for article
* `article2share`, instructing to share article
* `article2bookmark`, instructing to bookmark article and then check the
bookmarks.
* `article2steps`, crafted from `stepped_summary` questions in
[wikihow-lists](https://huggingface.co/datasets/b-mc2/wikihow_lists)
* `article2ingredientes`, crafted from `ingredients` questions in
[wikihow-lists](https://huggingface.co/datasets/b-mc2/wikihow_lists)
* `article2needed_items`, crafted from `needed_items` questions in
[wikihow-lists](https://huggingface.co/datasets/b-mc2/wikihow_lists)
* `article2summary`, crafted from
[WikiHowNFQA](https://huggingface.co/datasets/Lurunchik/WikiHowNFQA) tasks
A template is composed for each sub-task, containing a group of filling slots
expecting some keywords like article title, author name, question, and
groundtruth answer. Then these keywords are sampled from the crawled app data
or from the two QA datasets to instantiate the templates. Subsequently, the
instantiated templates are concatenated into multi-stage task definitions under
the constraint that the target page/element/answer (the part after `2`, *e.g.*,
`share` from `article2share`) is directly on/referenced by the current page
(the part before `2`, *e.g.*, `article` from `article2share`). Finally, we
obtained the task set of 150 multistage tasks in which there are 2.68
single-stage sub-tasks averagely.
The multistage tasks containing different sub-tasks are suffixed with different
numbers. The meanings of suffixes and the number of suffixed tasks in the micro
canonical set are list in the following table:
| Suffix | Sub-tasks | #Tasks |
|--------|------------------------------------------|--------|
| 0 | `home-search-article-about` | 18 |
| 1 | `home-search-article-rate_no` | 6 |
| 2 | `home-search-article-rate_yes` | 10 |
| 3 | `home-search-article-share` | 11 |
| 4 | `home-search-article-author[-article]` | 7 |
| 5 | `home-search-article-bookmark` | 13 |
| 6 | `home-search-article-category[-article]` | 9 |
| 7 | `home-search-article-reference` | 11 |
| 8 | `home-search-article` | 25 |
| 9 | `home-search-steps` | 15 |
| 10 | `home-search-needed_items` | 10 |
| 11 | `home-search-ingredients` | 5 |
| 12 | `home-search-summary` | 10 |
### About
This task set is developed and maintained by [SJTU
X-Lance](https://x-lance.sjtu.edu.cn/en). The corresponding paper is available
at <https://arxiv.org/abs/2305.08144>.
If you find WikiHow task set useful in your research, you can cite the project
using the following BibTeX:
```bibtex
@article{DanyangZhang2023_MobileEnv_WikiHow,
title = {{Mobile-Env}: An Evaluation Platform and Benchmark for LLM-GUI Interaction},
author = {Danyang Zhang and
Lu Chen and
Zihan Zhao and
Ruisheng Cao and
Kai Yu},
journal = {CoRR},
volume = {abs/2305.08144},
year = {2023},
url = {https://arxiv.org/abs/2305.08144},
eprinttype = {arXiv},
eprint = {2305.08144},
}
```
提供机构:
zdy023
原始信息汇总
数据集概述
名称: WikiHow Task Set
基础: 基于Mobile-Env构建的InfoUI交互任务集。
来源: 从WikiHow网站爬取,包含超过340,000篇文章。
数据量: 爬取了107,448页,总数据量约88 GiB。
任务定义: 根据WikiHow应用的功能设计了多个任务定义模板,并通过Mobile-Env中的模板工具包实例化。
任务集:
- Canonical Set: 从扩展集中抽样577个任务,文件名为
wikihow-canonical.tar.xz。 - Canonical Subset (Micro Canonical Set): 由于预算限制,测试了150个任务,文件名为
wikihow-microcanon.tar.xz。
数据集组件
网站数据重播:
证书取消固定计划:
- 使用Mobile-Env的
syscert计划,适用于WikiHow应用。
人类重写指令:
- 提供
instruction_rewriting/目录,包含重写的指令和公共模式。 - 使用doccano收集注释。
子任务细节:
- 包含16种类型子任务,如
home2search,search2article等。 - 每个子任务包含一组填充槽,用于关键词如文章标题、作者名等。
数据集维护
开发与维护: 由SJTU X-Lance负责。
相关论文: 可访问arXiv获取详细信息。



