Halluminate/westworld

Name: Halluminate/westworld
Creator: Halluminate
Published: 2025-11-19 06:03:38
License: 暂无描述

Hugging Face2025-11-19 更新2026-01-03 收录

下载链接：

https://hf-mirror.com/datasets/Halluminate/westworld

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 configs: - config_name: default data_files: - split: validation path: data/validation-* dataset_info: features: - name: task_id dtype: string - name: task_generation_config_json dtype: string - name: env dtype: string - name: domain dtype: string - name: l1_category dtype: string - name: l2_category dtype: string - name: suggested_difficulty dtype: string - name: suggested_hint dtype: string - name: suggested_max_steps dtype: int64 - name: suggested_split dtype: string - name: metadata_json dtype: string splits: - name: validation num_bytes: 133363 num_examples: 100 download_size: 34438 dataset_size: 133363 pretty_name: Halluminate Westworld size_categories: - 1K<n<10K --- # Halluminate Westworld A benchmark for web agents to perform tasks on realistic websites. Repo: https://github.com/Halluminate/westworld Blog post: https://halluminate.ai/blog/westworld ## Quick Start: Try a Task Yourself Want to understand what the benchmark tasks look like? You can run them manually using our human-in-the-loop demo: ### Step 1: Install with Browser Support ```bash # Using uv (recommended) uv pip install -e ".[datasets,playwright]" python -m playwright install chromium # Or using pip pip install -e ".[datasets,playwright]" python -m playwright install chromium ``` ### Step 2: Set Your API Key (for simulated environments) ```bash export HALLUMINATE_API_KEY=your-key-here ``` *contact wyatt@halluminate.ai for an api key* ### Step 3: Run the Demo You can run the demo in two ways: **Option A: Run by dataset index** ```bash westworld-demo --index 0 ``` **Option B: Run by specific task ID** ```bash westworld-demo --task-id westworld/azora/basic_checkout/0 ``` **Alternative: Run as Python module** ```bash # By index python -m westworld.demo --index 0 # By task ID python -m westworld.demo --task-id westworld/azora/basic_checkout/0 ``` ## Dataset The benchmark dataset is available on [HuggingFace](https://huggingface.co/datasets/Halluminate/westworld): ```python from datasets import load_dataset dataset = load_dataset("Halluminate/westworld") ``` ## Usage ### Loading and Running Tasks ```python from westworld.base import DatasetItem, instantiate # Load a task from the dataset task_item = DatasetItem(**dataset[0]) # Generate the task configuration task_config = task_item.generate_task_config() # Access task details print(f"Task: {task_config.task}") print(f"URL: {task_config.url}") print(f"Evaluation Config: {task_config.eval_config}") ``` ### Task Categories The benchmark includes the following task categories (L1 categories): - **e_commerce**: Online shopping tasks across multiple platforms - Basic checkout flows - Delivery instruction handling - Pickup order management - **travel**: Travel booking and search tasks - Flight searches (basic, roundtrip, date ranges) - Airline-specific searches - Hotel searches - Budget-constrained searches ## License This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details. ## Citation If you use Halluminate Westworld in your research, please cite: ```bibtex @software{halluminate_westworld, title = {Halluminate Westworld: A Web Agent Benchmark}, author = {Halluminate}, year = {2025}, url = {https://github.com/Halluminate/westworld} } ``` ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## Contact For questions or issues, please open an issue on [GitHub](https://github.com/Halluminate/westworld/issues).

提供机构：

Halluminate

5,000+

优质数据集

54 个

任务类型

进入经典数据集