five

WONDERBREAD: A Benchmark + Dataset for Business Process Management (BPM) Tasks

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11468259
下载链接
链接失效反馈
官方服务:
资源简介:
Paper: WONDERBREAD: A Benchmark for Evaluating Multimodal Foundation Models on Business Process Management Tasks Background The WONDERBREAD dataset contains 2,928 human demonstrations of 598 web navigation workflows across 6 types of BPM tasks. These tasks measure the ability of a model to generate accurate documentation, assist in knowledge transfer, and improve  the effeciency of workflows. Please see our website for more details: https://wonderbread.stanford.edu/ Quick Start To start, download debug_demos.zip (~1 GB). It contains a subset of 24 demonstrations which can give you a sense of how the dataset is structured. To reproduce the paper, download gold_demos.zip (~33 GB). It contains 724 demonstrations corresponding to the 162 "Gold" tasks which were used for all the evaluations in the original paper. To obtain the full dataset, download demos.zip (~133 GB). This contains all 2,928 demonstrations and can be used for training, fine-tuning, and evaluating models. Dataset Structure The dataset contains several files, defined below. Raw Data (useful for training/fine-tuning/evaluation) debug_demos.zip -- a subset of only 24 demonstrations taken from the full dataset. Useful to get a sense of the dataset and for debugging. gold_demos.zip -- a subset of only 724 demonstrations corresopnding to the 162 "Gold" tasks. This is the dataset that was used for all evaluations in the original WONDERBREAD paper. demos.zip -- all 2,928 demonstrations across 598 tasks. Useful for training your own models. Evaluation (useful for evaluation) qa_dataset.csv -- contains all 120 questions and ground truth answers used in the "Knowlege Transfer" evaluation. df_rankings.csv -- contains the rankings of all "Gold" tasks used in the "SOP Ranking" evaluation. Metadata (can be safely ignored) Process Mining Task Demonstrations.xlsx -- maps human annotators to specific demonstrations; also contains "Gold" task rankings used in the "SOP Ranking" evaluation. metadata.json -- maps Google Drive URLs to Google Drive Folder IDs to demonstration names df_valid.csv -- tracks assets associated with each demonstration
创建时间:
2024-10-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作