zdy023/WikiHow-taskset

Name: zdy023/WikiHow-taskset
Creator: zdy023
Published: 2024-04-30 07:49:59
License: 暂无描述

Hugging Face2024-04-30 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/zdy023/WikiHow-taskset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 --- (Works with [Mobile-Env v3.x](https://github.com/X-LANCE/Mobile-Env/tree/v3.0).) # WikiHow Task Set WikiHow task set is an InfoUI interaction task set based on [Mobile-Env](https://github.com/X-LANCE/Mobile-Env) proposed in [*Mobile-Env: An Evaluation Platform and Benchmark for Interactive Agents in LLM Era*](https://arxiv.org/abs/2305.08144). [WikiHow](https://www.wikihow.com/Main-Page) is a collaborative wiki site about various real-life tips with more than 340,000 online articles. To construct the task set, 107,448 pages are crawled, and the dumped website data occupy about 88 GiB totally. Several task definition templates are designed according to the functions of WikiHow app and task definitions are instantiated through the template toolkit in Mobile-Env. 577 tasks are sampled from the extended set, which is named the *canonical set* (`wikihow-canonical.tar.xz`). Owing to the limit of the budgets, only 150 tasks are tested using the proposed LLM-based agent. These 150 tasks are given in `wikihow-microcanon.tar.xz`. We call it the *canonical subset* or the *micro canonical set*. ### Website Data Replay The replay script for [mitmproxy](https://mitmproxy.org/) is given as `replay_url.py`. To use this replay script, the information retrieval tool [Pyserini](https://github.com/castorini/pyserini/) is required. Four parameters are expected to be assigned in the script: + The crawled data from WikiHow website (`dumps` in `wikihow.data.tar.xz`) + The HTML templates used to mock the search result page (`templates` in `wikihow.data.tar.xz`) + The indices for the search engine based on Pyserini (`indices-t/indices` in `wikihow.data.tar.xz`) + The metadata of the crawled articles (`indices-t/docs/doc_meta.csv` in `wikihow.data.tar.xz`) All the required data are offered in `wikihow.data.tar.xz`. (The archive is about 78 GiB. And the decompressed data are about 88 GiB.) The archive is split into two pieces (`wikihow.data.tar.xz.00` and `wikihow.data.tar.xz.01`). You can use `cat` to concatenate them: ```sh cat wikihow.data.tar.xz.00 wikihow.data.tar.xz.01 >wikihow.data.tar.xz ``` The SHA256 checksums are provided in `wikihow.data.tar.xz.sha256` to check the integrity. To run the script: ```sh mitmproxy --showhost -s replay_url.py ``` ### Certificate Unpinning Plan The `syscert` plan proposed by Mobile-Env works for WikiHow app. You can complete the config according to the [guideline of Mobile-Env](https://github.com/X-LANCE/Mobile-Env/blob/master/docs/dynamic-app-en.md). The available APK package from [APKCombo](https://apkcombo.com/) is provided. And note to use the AVD image of version Android 11.0 (API Level 30) (Google APIs) to obtain the best compatibility and the root-enabled ADBD. ### Human-Rewritten Instructions Human-rewritten instructions for the *canonical set* are release under `instruction_rewriting/`. An AndroidEnv wrapper `InstructionRewritingWrapper` is provided to load the rewritten instructions (`merged_doccano.json`) and public patterns (`pattern-*.txt`). The annotations are collected via [doccano](https://doccano.github.io/doccano/). The patterns are parsed by [`sentence_pattern.py`](instruction_rewriting/sentence_pattern.py). ### Details of Sub-Tasks WikiHow taks are crafted from 16 types of sub-tasks: * `home2search`, instructing to search for an article from the home page. * `search2article`, `author2article`, & `category2article`, instructing to access an article from search result page, author information page, and category content page, respectively. * `article2about`, instructing to access the about page from article page. * `article2author`, instructing to access author information page from article page. * `article2category`, instructing to access category content page from article page. * `article2reference`, instructing to check reference list on article page. * `article2rate_no`, instructing to rate no for article * `article2rate_yes`, instructing to rate yes for article * `article2share`, instructing to share article * `article2bookmark`, instructing to bookmark article and then check the bookmarks. * `article2steps`, crafted from `stepped_summary` questions in [wikihow-lists](https://huggingface.co/datasets/b-mc2/wikihow_lists) * `article2ingredientes`, crafted from `ingredients` questions in [wikihow-lists](https://huggingface.co/datasets/b-mc2/wikihow_lists) * `article2needed_items`, crafted from `needed_items` questions in [wikihow-lists](https://huggingface.co/datasets/b-mc2/wikihow_lists) * `article2summary`, crafted from [WikiHowNFQA](https://huggingface.co/datasets/Lurunchik/WikiHowNFQA) tasks A template is composed for each sub-task, containing a group of filling slots expecting some keywords like article title, author name, question, and groundtruth answer. Then these keywords are sampled from the crawled app data or from the two QA datasets to instantiate the templates. Subsequently, the instantiated templates are concatenated into multi-stage task definitions under the constraint that the target page/element/answer (the part after `2`, *e.g.*, `share` from `article2share`) is directly on/referenced by the current page (the part before `2`, *e.g.*, `article` from `article2share`). Finally, we obtained the task set of 150 multistage tasks in which there are 2.68 single-stage sub-tasks averagely. The multistage tasks containing different sub-tasks are suffixed with different numbers. The meanings of suffixes and the number of suffixed tasks in the micro canonical set are list in the following table: | Suffix | Sub-tasks | #Tasks | |--------|------------------------------------------|--------| | 0 | `home-search-article-about` | 18 | | 1 | `home-search-article-rate_no` | 6 | | 2 | `home-search-article-rate_yes` | 10 | | 3 | `home-search-article-share` | 11 | | 4 | `home-search-article-author[-article]` | 7 | | 5 | `home-search-article-bookmark` | 13 | | 6 | `home-search-article-category[-article]` | 9 | | 7 | `home-search-article-reference` | 11 | | 8 | `home-search-article` | 25 | | 9 | `home-search-steps` | 15 | | 10 | `home-search-needed_items` | 10 | | 11 | `home-search-ingredients` | 5 | | 12 | `home-search-summary` | 10 | ### About This task set is developed and maintained by [SJTU X-Lance](https://x-lance.sjtu.edu.cn/en). The corresponding paper is available at <https://arxiv.org/abs/2305.08144>. If you find WikiHow task set useful in your research, you can cite the project using the following BibTeX: ```bibtex @article{DanyangZhang2023_MobileEnv_WikiHow, title = {{Mobile-Env}: An Evaluation Platform and Benchmark for LLM-GUI Interaction}, author = {Danyang Zhang and Lu Chen and Zihan Zhao and Ruisheng Cao and Kai Yu}, journal = {CoRR}, volume = {abs/2305.08144}, year = {2023}, url = {https://arxiv.org/abs/2305.08144}, eprinttype = {arXiv}, eprint = {2305.08144}, } ```

提供机构：

zdy023

原始信息汇总

数据集概述

名称: WikiHow Task Set

基础: 基于Mobile-Env构建的InfoUI交互任务集。

来源: 从WikiHow网站爬取，包含超过340,000篇文章。

数据量: 爬取了107,448页，总数据量约88 GiB。

任务定义: 根据WikiHow应用的功能设计了多个任务定义模板，并通过Mobile-Env中的模板工具包实例化。

任务集:

Canonical Set: 从扩展集中抽样577个任务，文件名为wikihow-canonical.tar.xz。
Canonical Subset (Micro Canonical Set): 由于预算限制，测试了150个任务，文件名为wikihow-microcanon.tar.xz。

数据集组件

网站数据重播:

提供replay_url.py脚本，用于mitmproxy。
需要Pyserini信息检索工具。
包含四个主要参数：爬取的WikiHow数据、HTML模板、搜索引擎索引和文章元数据。

证书取消固定计划:

使用Mobile-Env的syscert计划，适用于WikiHow应用。

人类重写指令:

提供instruction_rewriting/目录，包含重写的指令和公共模式。
使用doccano收集注释。

子任务细节:

包含16种类型子任务，如home2search, search2article等。
每个子任务包含一组填充槽，用于关键词如文章标题、作者名等。

数据集维护

开发与维护: 由SJTU X-Lance负责。

相关论文: 可访问arXiv获取详细信息。

5,000+

优质数据集

54 个

任务类型

进入经典数据集