WONDERBREAD: A Benchmark + Dataset for Business Process Management (BPM) Tasks
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11468259
下载链接
链接失效反馈官方服务:
资源简介:
Paper: WONDERBREAD: A Benchmark for Evaluating Multimodal Foundation Models on Business Process Management Tasks
Background
The WONDERBREAD dataset contains 2,928 human demonstrations of 598 web navigation workflows across 6 types of BPM tasks. These tasks measure the ability of a model to generate accurate documentation, assist in knowledge transfer, and improve the effeciency of workflows.
Please see our website for more details: https://wonderbread.stanford.edu/
Quick Start
To start, download debug_demos.zip (~1 GB). It contains a subset of 24 demonstrations which can give you a sense of how the dataset is structured.
To reproduce the paper, download gold_demos.zip (~33 GB). It contains 724 demonstrations corresponding to the 162 "Gold" tasks which were used for all the evaluations in the original paper.
To obtain the full dataset, download demos.zip (~133 GB). This contains all 2,928 demonstrations and can be used for training, fine-tuning, and evaluating models.
Dataset Structure
The dataset contains several files, defined below.
Raw Data (useful for training/fine-tuning/evaluation)
debug_demos.zip -- a subset of only 24 demonstrations taken from the full dataset. Useful to get a sense of the dataset and for debugging.
gold_demos.zip -- a subset of only 724 demonstrations corresopnding to the 162 "Gold" tasks. This is the dataset that was used for all evaluations in the original WONDERBREAD paper.
demos.zip -- all 2,928 demonstrations across 598 tasks. Useful for training your own models.
Evaluation (useful for evaluation)
qa_dataset.csv -- contains all 120 questions and ground truth answers used in the "Knowlege Transfer" evaluation.
df_rankings.csv -- contains the rankings of all "Gold" tasks used in the "SOP Ranking" evaluation.
Metadata (can be safely ignored)
Process Mining Task Demonstrations.xlsx -- maps human annotators to specific demonstrations; also contains "Gold" task rankings used in the "SOP Ranking" evaluation.
metadata.json -- maps Google Drive URLs to Google Drive Folder IDs to demonstration names
df_valid.csv -- tracks assets associated with each demonstration
创建时间:
2024-10-14



