five

Ahmed46S4/webgym_tasks

收藏
Hugging Face2026-03-08 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Ahmed46S4/webgym_tasks
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cdla-permissive-2.0 task_categories: - reinforcement-learning language: - en tags: - web-navigation - web-agents - task-planning size_categories: - 100K<n<1M configs: - config_name: default data_files: - split: train path: train.jsonl - split: test path: test.jsonl --- # WebGym Tasks Dataset ## Dataset Description This dataset contains web navigation tasks for training and evaluating autonomous web agents. Each task consists of a natural language instruction that describes an action to be performed on a specific website, along with evaluation criteria and metadata. ### Dataset Summary - **Total Training Tasks**: 292,092 - **Total Test Tasks**: 1,167 - **Domains**: Multiple domains including Lifestyle & Leisure, Sports & Fitness, and more - **Source Benchmarks**: Includes tasks from mind2web-live and other web navigation benchmarks ### Data Format The dataset is provided in JSONL (JSON Lines) format with two splits: - `train.jsonl`: Training set - `test.jsonl`: Test set ### Data Fields Each task contains the following fields: - `benchmark_name` (string): Source benchmark (e.g., "mind2web-live") - `task_name` (string): Natural language description of the task - `domain` (string): High-level domain category - `subdomain` (string): Specific subdomain category - `website` (string): Target website URL - `definite_answer` (string): Expected answer if applicable - `task_id` (string): Unique task identifier. Numeric strings for base tasks (e.g., "0", "1"), strings with suffix for decomposed tasks (e.g., "100002_d1") - `difficulty` (integer): Task difficulty level - `evaluator_reference` (list): Evaluation criteria with descriptions and facts - `task_id_decomposed_from` (string/null): Parent task ID if this is a decomposed subtask. Loaded as string to match task_id format ### Example ```json { "benchmark_name": "mind2web-live", "task_name": "Find the score of the 2020 Super Bowl in nfl.com", "domain": "Lifestyle & Leisure", "subdomain": "Sports & Fitness", "website": "https://nfl.com", "definite_answer": "", "task_id": 0, "difficulty": 2, "evaluator_reference": [ { "id": 1, "description": "find score information for the 2020 Super Bowl", "facts": [ "score of the 2020 Super Bowl", "information found on nfl.com" ] } ], "task_id_decomposed_from": null } ``` ### Dataset Notes **Synthetic Data Disclosure**: This dataset contains synthetically generated tasks and may include synthetic components in task descriptions, evaluation criteria, and other fields. **Task ID Fields**: Both `task_id` and `task_id_decomposed_from` are stored as strings for consistency. The `task_id` field contains numeric strings for base tasks (e.g., "0", "1") and strings with suffixes for decomposed tasks (e.g., "100002_d1"). The dataset includes 33,497 decomposed tasks representing about 11.5% of the training data. ### Usage Load the dataset using the Hugging Face datasets library: ```python from datasets import load_dataset # Load from HuggingFace Hub dataset = load_dataset("your-username/webgym-tasks") # Or load from local files dataset = load_dataset("json", data_files={ "train": "train.jsonl", "test": "test.jsonl" }) # Access examples for task in dataset["train"]: print(f"Task {task['task_id']}: {task['task_name']}") ``` ### Citation If you use this dataset, please cite: ```bibtex @article{bai2026webgym, title={WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks}, author={Bai, Hao and Taymanov, Alexey and Zhang, Tong and Kumar, Aviral and Whitehead, Spencer}, journal={arXiv preprint arXiv:2601.02439}, year={2026} } ``` ### License This dataset is released under the [Community Data License Agreement - Permissive - Version 2.0 (CDLA-Permissive-2.0)](https://cdla.dev/permissive-2-0/).
提供机构:
Ahmed46S4
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作