ScalingIntelligence/monkey_business
收藏Hugging Face2025-10-08 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ScalingIntelligence/monkey_business
下载链接
链接失效反馈官方服务:
资源简介:
Monkey Business 是一个包含来自大型语言模型的样本数据集,包含来自 Llama-3、Gemma 和 Pythia 系列模型在各种任务(包括 GSM8K、MATH、CodeContests 和 MiniF2F-MATH 的问题)上的正确和错误样本。该数据集旨在帮助开发改进的验证方法,以评估模型生成的答案是否正确。数据集由斯坦福大学的 Scaling Intelligence 项目创建,该项目旨在通过重复采样扩展推理计算。
Monkey Business is a dataset of samples from large language models, including both correct and incorrect samples from a variety of models (Llama-3, Gemma, and Pythia series) on various tasks (problems from GSM8K, MATH, CodeContests, and MiniF2F-MATH). It is intended to be useful for developing improved verification methods that assess whether a model-generated answer is correct. The dataset was created as part of the project Large Language Monkeys: Scaling Inference Compute with Repeated Sampling by Stanford Universitys Scaling Intelligence project.
提供机构:
ScalingIntelligence



