WONDERBREAD: A Benchmark + Dataset for Business Process Management (BPM) Tasks

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/11468259

下载链接

链接失效反馈

官方服务：

资源简介：

Paper: WONDERBREAD: A Benchmark for Evaluating Multimodal Foundation Models on Business Process Management Tasks Background The WONDERBREAD dataset contains 2,928 human demonstrations of 598 web navigation workflows across 6 types of BPM tasks. These tasks measure the ability of a model to generate accurate documentation, assist in knowledge transfer, and improve the effeciency of workflows. Please see our website for more details: https://wonderbread.stanford.edu/ Quick Start To start, download debug_demos.zip (~1 GB). It contains a subset of 24 demonstrations which can give you a sense of how the dataset is structured. To reproduce the paper, download gold_demos.zip (~33 GB). It contains 724 demonstrations corresponding to the 162 "Gold" tasks which were used for all the evaluations in the original paper. To obtain the full dataset, download demos.zip (~133 GB). This contains all 2,928 demonstrations and can be used for training, fine-tuning, and evaluating models. Dataset Structure The dataset contains several files, defined below. Raw Data (useful for training/fine-tuning/evaluation) debug_demos.zip -- a subset of only 24 demonstrations taken from the full dataset. Useful to get a sense of the dataset and for debugging. gold_demos.zip -- a subset of only 724 demonstrations corresopnding to the 162 "Gold" tasks. This is the dataset that was used for all evaluations in the original WONDERBREAD paper. demos.zip -- all 2,928 demonstrations across 598 tasks. Useful for training your own models. Evaluation (useful for evaluation) qa_dataset.csv -- contains all 120 questions and ground truth answers used in the "Knowlege Transfer" evaluation. df_rankings.csv -- contains the rankings of all "Gold" tasks used in the "SOP Ranking" evaluation. Metadata (can be safely ignored) Process Mining Task Demonstrations.xlsx -- maps human annotators to specific demonstrations; also contains "Gold" task rankings used in the "SOP Ranking" evaluation. metadata.json -- maps Google Drive URLs to Google Drive Folder IDs to demonstration names df_valid.csv -- tracks assets associated with each demonstration

创建时间：

2024-10-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集