guactastesgood/GSM-Ranges
收藏Hugging Face2025-02-14 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/guactastesgood/GSM-Ranges
下载链接
链接失效反馈官方服务:
资源简介:
GSM-Ranges是一个基于GSM8K基准构建的数据集生成器,它通过系统地修改数学单词问题中的数值,来评估大型语言模型在广泛数值范围内的数学推理鲁棒性。该数据集通过引入数值扰动,评价LLM模型对非分布内数值的泛化能力。每个扰动级别包括50组共100个问题,共有5000个问题。数据集还包含了用于比较的GSM8K原始的100个基础问题。
GSM-Ranges is a dataset generator built upon the GSM8K benchmark, which systematically modifies numerical values in math word problems to assess the robustness of large language models (LLMs) across a broad spectrum of numerical scales. By introducing numerical perturbations, GSM-Ranges evaluates how well LLMs generalize mathematical reasoning to out-of-distribution numerical values. Each perturbation level includes 50 sets of 100 questions, resulting in 5,000 problems per level. The dataset also includes the original 100 base questions from GSM8K for comparison.
提供机构:
guactastesgood



