Ahmed46S4/webgym_tasks

Name: Ahmed46S4/webgym_tasks
Creator: Ahmed46S4
Published: 2026-03-08 09:25:22
License: 暂无描述

Hugging Face2026-03-08 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/Ahmed46S4/webgym_tasks

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cdla-permissive-2.0 task_categories: - reinforcement-learning language: - en tags: - web-navigation - web-agents - task-planning size_categories: - 100K<n<1M configs: - config_name: default data_files: - split: train path: train.jsonl - split: test path: test.jsonl --- # WebGym Tasks Dataset ## Dataset Description This dataset contains web navigation tasks for training and evaluating autonomous web agents. Each task consists of a natural language instruction that describes an action to be performed on a specific website, along with evaluation criteria and metadata. ### Dataset Summary - **Total Training Tasks**: 292,092 - **Total Test Tasks**: 1,167 - **Domains**: Multiple domains including Lifestyle & Leisure, Sports & Fitness, and more - **Source Benchmarks**: Includes tasks from mind2web-live and other web navigation benchmarks ### Data Format The dataset is provided in JSONL (JSON Lines) format with two splits: - `train.jsonl`: Training set - `test.jsonl`: Test set ### Data Fields Each task contains the following fields: - `benchmark_name` (string): Source benchmark (e.g., "mind2web-live") - `task_name` (string): Natural language description of the task - `domain` (string): High-level domain category - `subdomain` (string): Specific subdomain category - `website` (string): Target website URL - `definite_answer` (string): Expected answer if applicable - `task_id` (string): Unique task identifier. Numeric strings for base tasks (e.g., "0", "1"), strings with suffix for decomposed tasks (e.g., "100002_d1") - `difficulty` (integer): Task difficulty level - `evaluator_reference` (list): Evaluation criteria with descriptions and facts - `task_id_decomposed_from` (string/null): Parent task ID if this is a decomposed subtask. Loaded as string to match task_id format ### Example ```json { "benchmark_name": "mind2web-live", "task_name": "Find the score of the 2020 Super Bowl in nfl.com", "domain": "Lifestyle & Leisure", "subdomain": "Sports & Fitness", "website": "https://nfl.com", "definite_answer": "", "task_id": 0, "difficulty": 2, "evaluator_reference": [ { "id": 1, "description": "find score information for the 2020 Super Bowl", "facts": [ "score of the 2020 Super Bowl", "information found on nfl.com" ] } ], "task_id_decomposed_from": null } ``` ### Dataset Notes **Synthetic Data Disclosure**: This dataset contains synthetically generated tasks and may include synthetic components in task descriptions, evaluation criteria, and other fields. **Task ID Fields**: Both `task_id` and `task_id_decomposed_from` are stored as strings for consistency. The `task_id` field contains numeric strings for base tasks (e.g., "0", "1") and strings with suffixes for decomposed tasks (e.g., "100002_d1"). The dataset includes 33,497 decomposed tasks representing about 11.5% of the training data. ### Usage Load the dataset using the Hugging Face datasets library: ```python from datasets import load_dataset # Load from HuggingFace Hub dataset = load_dataset("your-username/webgym-tasks") # Or load from local files dataset = load_dataset("json", data_files={ "train": "train.jsonl", "test": "test.jsonl" }) # Access examples for task in dataset["train"]: print(f"Task {task['task_id']}: {task['task_name']}") ``` ### Citation If you use this dataset, please cite: ```bibtex @article{bai2026webgym, title={WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks}, author={Bai, Hao and Taymanov, Alexey and Zhang, Tong and Kumar, Aviral and Whitehead, Spencer}, journal={arXiv preprint arXiv:2601.02439}, year={2026} } ``` ### License This dataset is released under the [Community Data License Agreement - Permissive - Version 2.0 (CDLA-Permissive-2.0)](https://cdla.dev/permissive-2-0/).

提供机构：

Ahmed46S4

5,000+

优质数据集

54 个

任务类型

进入经典数据集