gretel-math-gsm8k-v0
收藏魔搭社区2025-11-27 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/gretelai/gretel-math-gsm8k-v0
下载链接
链接失效反馈官方服务:
资源简介:
# gretelai/gsm8k-synthetic-diverse-405b
This dataset is a synthetically generated version inspired by the GSM8K `https://huggingface.co/datasets/openai/gsm8k` dataset, created entirely using **Gretel Navigator with meta-llama/Meta-Llama-3.1-405B** as the agent LLM. It contains ~1500 Grade School-level math word problems with step-by-step solutions, focusing on age group, difficulty, and domain diversity.
## Key Features:
- Synthetically Generated: Math problems created using Gretel Navigator, employing evolutionary techniques, LLM-as-a-judge, and verification of annotated calculations via the `sympy` library.
- Stratified Test Set: 300 examples for test, remaining for training, stratified by topic and difficulty.
- Diverse Contexts and Names: Problems feature a wide range of real-world contexts and include diverse names and ethnicities.
- Age Group Labeling: Each problem is tagged with an appropriate age group (grades 2 through 6).
- Difficulty Categorization: Problems are categorized as easy, medium, or hard.
- Expanded Domains: Covers a wide range of topics including basic algebra, geometry, and more.
- Step-by-Step Solutions: Clear reasoning with annotated arithmetic operations.
## Dataset Statistics and Distribution

## Gretel Navigator (selected model: meta-llama/Meta-Llama-3.1-405B) Dataset - Distribution Analysis
### Topic Distribution
| topic | Train | Test |
|:-------------------------|--------:|-------:|
| algebra | 25 | 20 |
| arithmetic | 31 | 25 |
| compound interest | 26 | 21 |
| data interpretation | 27 | 20 |
| exponential growth/decay | 25 | 21 |
| fractions | 29 | 24 |
| geometry | 35 | 29 |
| optimization | 23 | 19 |
| percentages | 37 | 29 |
| polynomials | 21 | 18 |
| probability | 20 | 17 |
| proportions | 30 | 24 |
| ratios | 41 | 33 |
### Difficulty Distribution
| difficulty | Train | Test |
|:-------------|--------:|-------:|
| easy | 93 | 75 |
| hard | 82 | 67 |
| medium | 101 | 83 |
| very hard | 94 | 75 |
## Citation and Usage
If you use this dataset in your research or applications, please cite it as:
```
@dataset{gretelai_gsm8k_synthetic,
author = {Gretel AI},
title = {Synthetically Generated Math Word Problems Dataset (gsm8k) with enhanced diversity using Gretel Navigator and meta-llama/Meta-Llama-3.1-405B},
year = {2024},
month = {9},
publisher = {Gretel},
howpublished = {https://huggingface.co/gretelai/gsm8k-synthetic-diverse-405b},
}
```
For questions, issues, or additional information, please visit the dataset repository on Hugging Face or contact Gretel AI.
# gretelai/gsm8k-synthetic-diverse-405b
本数据集是受GSM8K(https://huggingface.co/datasets/openai/gsm8k)数据集启发而合成生成的版本,完全通过**Gretel Navigator结合大语言模型(Large Language Model, LLM)meta-llama/Meta-Llama-3.1-405B**作为AI智能体(AI Agent)构建而成。该数据集包含约1500道小学年级数学应用题及分步解题过程,聚焦于年龄段、难度与领域的多样性。
## 核心特性
- 合成生成:本数据集的数学题目由Gretel Navigator生成,采用进化技术、大语言模型作为评判者(LLM-as-a-judge),并通过`sympy`库(sympy)对标注的计算过程进行验证。
- 分层测试集:包含300条测试样本,剩余样本用于训练,且按主题与难度进行分层采样。
- 多样化语境与命名:题目涵盖广泛的真实世界语境,并包含多元的姓名与族裔特征。
- 年龄段标注:每道题目均标注了适配的年龄段(2至6年级)。
- 难度分级:题目被划分为简单、中等、困难三个等级。
- 扩展领域:覆盖包括基础代数、几何在内的广泛主题。
- 分步解题过程:提供清晰的推理过程与标注化的算术运算步骤。
## 数据集统计与分布

## Gretel Navigator(所选模型:meta-llama/Meta-Llama-3.1-405B)数据集——分布分析
### 主题分布
| 主题 | 训练集样本数 | 测试集样本数 |
|:-------------------------|--------:|-------:|
| 代数(algebra) | 25 | 20 |
| 算术(arithmetic) | 31 | 25 |
| 复利(compound interest) | 26 | 21 |
| 数据解读(data interpretation) | 27 | 20 |
| 指数增长/衰减(exponential growth/decay) | 25 | 21 |
| 分数(fractions) | 29 | 24 |
| 几何(geometry) | 35 | 29 |
| 优化(optimization) | 23 | 19 |
| 百分比(percentages) | 37 | 29 |
| 多项式(polynomials) | 21 | 18 |
| 概率(probability) | 20 | 17 |
| 比例(proportions) | 30 | 24 |
| 比率(ratios) | 41 | 33 |
### 难度分布
| 难度等级 | 训练集样本数 | 测试集样本数 |
|:-------------|--------:|-------:|
| 简单(easy) | 93 | 75 |
| 困难(hard) | 82 | 67 |
| 中等(medium) | 101 | 83 |
| 极难(very hard) | 94 | 75 |
## 引用与使用说明
如果您在研究或应用中使用本数据集,请按如下格式引用:
@dataset{gretelai_gsm8k_synthetic,
author = {Gretel AI},
title = {使用Gretel Navigator与meta-llama/Meta-Llama-3.1-405B构建的多样化合成数学应用题数据集(gsm8k)},
year = {2024},
month = {9},
publisher = {Gretel},
howpublished = {https://huggingface.co/gretelai/gsm8k-synthetic-diverse-405b},
}
如需咨询、反馈或获取更多信息,请访问Hugging Face上的数据集仓库,或联系Gretel AI。
提供机构:
maas
创建时间:
2025-05-20



