five

autobencher-qa-33k

收藏
魔搭社区2025-12-03 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/allenai/autobencher-qa-33k
下载链接
链接失效反馈
官方服务:
资源简介:
These are 33K questions generated using [Autobencher](https://arxiv.org/abs/2407.08351). The questions come from randomly sampled Wikipedia articles, which are further filtered and transformed into questions by GPT-4o. This benchmark is used in the [signal and noise](https://huggingface.co/datasets/allenai/signal-and-noise) project to demonstrate the impact of a large sample size on the modeling noise of a benchmark. ### Citation Please cite the original authors of Autobencher, and our work which generated this particular evaluation set: ``` @article{li2024autobencher, title={Autobencher: Towards declarative benchmark construction}, author={Li, Xiang Lisa and Kaiyom, Farzaan and Liu, Evan Zheran and Mai, Yifan and Liang, Percy and Hashimoto, Tatsunori}, journal={arXiv preprint arXiv:2407.08351}, year={2024} } ``` ``` @article{heineman2025signal, title={Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation}, author={Heineman, David and Hofmann, Valentin and Magnusson, Ian and Gu, Yuling and Smith, Noah A and Hajishirzi, Hannaneh and Lo, Kyle and Dodge, Jesse}, journal={arXiv preprint arXiv:2508.13144}, year={2025} } ``` ### Dataset Description - **Developed by:** Allen Institute for AI (Ai2) - **Language(s) (NLP):** English - **License:** This dataset contains model outputs generated from GPT-4o, which is subject to OpenAI's [Terms of Use](https://openai.com/policies/row-terms-of-use/). This dataset is licensed under CC BY 4.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use) - **Contact:** Technical inquiries: `davidh@allenai.org`. Press: `press@allenai.org`

本数据集包含33000个由Autobencher生成的问题。这些问题源自随机采样的维基百科文章,后续经GPT-4o进一步筛选并转换为问答形式。 该基准测试集被应用于[signal and noise](https://huggingface.co/datasets/allenai/signal-and-noise)项目,用于展示大样本量对基准测试建模噪声的影响。 ### 引用 请同时引用Autobencher的原作者与本评测集的生成工作: @article{li2024autobencher, title={Autobencher: Towards declarative benchmark construction}, author={Li, Xiang Lisa and Kaiyom, Farzaan and Liu, Evan Zheran and Mai, Yifan and Liang, Percy and Hashimoto, Tatsunori}, journal={arXiv preprint arXiv:2407.08351}, year={2024} } @article{heineman2025signal, title={Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation}, author={Heineman, David and Hofmann, Valentin and Magnusson, Ian and Gu, Yuling and Smith, Noah A and Hajishirzi, Hannaneh and Lo, Kyle and Dodge, Jesse}, journal={arXiv preprint arXiv:2508.13144}, year={2025} } ### 数据集描述 - **开发者:** 艾伦人工智能研究所(Allen Institute for AI, Ai2) - **(自然语言处理适用)语言:** 英语 - **授权协议:** 本数据集包含由GPT-4o生成的模型输出,需遵循OpenAI的[使用条款](https://openai.com/policies/row-terms-of-use/)。本数据集采用CC BY 4.0协议授权,仅可用于符合艾伦人工智能研究所[负责任使用指南](https://allenai.org/responsible-use)的研究与教育用途。 - **联系方式:** 技术咨询:`davidh@allenai.org`;媒体咨询:`press@allenai.org`
提供机构:
maas
创建时间:
2025-08-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作