autobencher-qa-33k

Name: autobencher-qa-33k
Creator: maas
Published: 2025-12-03 17:29:36
License: 暂无描述

魔搭社区2025-12-03 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/allenai/autobencher-qa-33k

下载链接

链接失效反馈

官方服务：

资源简介：

These are 33K questions generated using [Autobencher](https://arxiv.org/abs/2407.08351). The questions come from randomly sampled Wikipedia articles, which are further filtered and transformed into questions by GPT-4o. This benchmark is used in the [signal and noise](https://huggingface.co/datasets/allenai/signal-and-noise) project to demonstrate the impact of a large sample size on the modeling noise of a benchmark. ### Citation Please cite the original authors of Autobencher, and our work which generated this particular evaluation set: ``` @article{li2024autobencher, title={Autobencher: Towards declarative benchmark construction}, author={Li, Xiang Lisa and Kaiyom, Farzaan and Liu, Evan Zheran and Mai, Yifan and Liang, Percy and Hashimoto, Tatsunori}, journal={arXiv preprint arXiv:2407.08351}, year={2024} } ``` ``` @article{heineman2025signal, title={Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation}, author={Heineman, David and Hofmann, Valentin and Magnusson, Ian and Gu, Yuling and Smith, Noah A and Hajishirzi, Hannaneh and Lo, Kyle and Dodge, Jesse}, journal={arXiv preprint arXiv:2508.13144}, year={2025} } ``` ### Dataset Description - **Developed by:** Allen Institute for AI (Ai2) - **Language(s) (NLP):** English - **License:** This dataset contains model outputs generated from GPT-4o, which is subject to OpenAI's [Terms of Use](https://openai.com/policies/row-terms-of-use/). This dataset is licensed under CC BY 4.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use) - **Contact:** Technical inquiries: `davidh@allenai.org`. Press: `press@allenai.org`

本数据集包含33000个由Autobencher生成的问题。这些问题源自随机采样的维基百科文章，后续经GPT-4o进一步筛选并转换为问答形式。该基准测试集被应用于[signal and noise](https://huggingface.co/datasets/allenai/signal-and-noise)项目，用于展示大样本量对基准测试建模噪声的影响。 ### 引用请同时引用Autobencher的原作者与本评测集的生成工作： @article{li2024autobencher, title={Autobencher: Towards declarative benchmark construction}, author={Li, Xiang Lisa and Kaiyom, Farzaan and Liu, Evan Zheran and Mai, Yifan and Liang, Percy and Hashimoto, Tatsunori}, journal={arXiv preprint arXiv:2407.08351}, year={2024} } @article{heineman2025signal, title={Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation}, author={Heineman, David and Hofmann, Valentin and Magnusson, Ian and Gu, Yuling and Smith, Noah A and Hajishirzi, Hannaneh and Lo, Kyle and Dodge, Jesse}, journal={arXiv preprint arXiv:2508.13144}, year={2025} } ### 数据集描述 - **开发者：** 艾伦人工智能研究所（Allen Institute for AI, Ai2） - **（自然语言处理适用）语言：** 英语 - **授权协议：** 本数据集包含由GPT-4o生成的模型输出，需遵循OpenAI的[使用条款](https://openai.com/policies/row-terms-of-use/)。本数据集采用CC BY 4.0协议授权，仅可用于符合艾伦人工智能研究所[负责任使用指南](https://allenai.org/responsible-use)的研究与教育用途。 - **联系方式：** 技术咨询：`davidh@allenai.org`；媒体咨询：`press@allenai.org`

提供机构：

maas

创建时间：

2025-08-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集