hyunjun1121/MacroBench

Name: hyunjun1121/MacroBench
Creator: hyunjun1121
Published: 2025-10-10 19:30:24
License: 暂无描述

Hugging Face2025-10-10 更新2025-10-25 收录

下载链接：

https://hf-mirror.com/datasets/hyunjun1121/MacroBench

下载链接

链接失效反馈

官方服务：

资源简介：

MacroBench是一个代码优先的基准测试，用于评估大型语言模型是否能够通过阅读HTML/DOM并发出Selenium代码，从自然语言目标中合成可重用的浏览器自动化程序（宏）。该数据集包含681个独特的自动化任务，跨越六个模拟真实世界平台的合成网站，并提供了四个最先进的大型语言模型在2636个模型-任务组合上的完整实验结果。

MacroBench is a code-first benchmark that evaluates whether Large Language Models can synthesize reusable browser-automation programs (macros) from natural-language goals by reading HTML/DOM and emitting Selenium code. The dataset consists of 681 distinct automation tasks across six synthetic websites emulating real-world platforms (TikTok, Reddit, Instagram, Facebook, Discord, Threads) and provides complete experimental results from evaluating four state-of-the-art LLMs across 2,636 model-task combinations.

提供机构：

hyunjun1121

5,000+

优质数据集

54 个

任务类型

进入经典数据集