five

gsm8k-synthetic-diverse-8b

收藏
魔搭社区2025-12-05 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/gretelai/gsm8k-synthetic-diverse-8b
下载链接
链接失效反馈
官方服务:
资源简介:
# gretelai/gsm8k-synthetic-diverse-8b This dataset is a synthetically generated version inspired by the GSM8K `https://huggingface.co/datasets/openai/gsm8k` dataset, created entirely using **Gretel Navigator with meta-llama/Meta-Llama-3.1-8B** as the agent LLM. It contains ~1500 Grade School-level math word problems with step-by-step solutions, focusing on age group, difficulty, and domain diversity. ## Key Features: - Synthetically Generated: Math problems created using Gretel Navigator, employing evolutionary techniques, LLM-as-a-judge, and verification of annotated calculations via the `sympy` library. - Stratified Test Set: 300 examples for test, remaining for training, stratified by topic and difficulty. - Diverse Contexts and Names: Problems feature a wide range of real-world contexts and include diverse names and ethnicities. - Age Group Labeling: Each problem is tagged with an appropriate age group (grades 2 through 6). - Difficulty Categorization: Problems are categorized as easy, medium, or hard. - Expanded Domains: Covers a wide range of topics including basic algebra, geometry, and more. - Step-by-Step Solutions: Clear reasoning with annotated arithmetic operations. ## Dataset Statistics and Distribution ![meta-llama/Meta-Llama-3.1-8B Dataset Distribution](images/gsm8k-synthetic-diverse-8b_analysis.png) ## Gretel Navigator (selected model: meta-llama/Meta-Llama-3.1-8B) Dataset - Distribution Analysis ### Topic Distribution | topic | Train | Test | |:-----------------------|--------:|-------:| | arithmetic | 193 | 38 | | basic algebra | 179 | 35 | | data interpretation | 202 | 40 | | fractions | 181 | 35 | | geometry | 171 | 33 | | percentages | 203 | 41 | | ratios and proportions | 201 | 39 | | word problems | 198 | 39 | ### Difficulty Distribution | difficulty | Train | Test | |:-------------|--------:|-------:| | easy | 531 | 104 | | hard | 509 | 101 | | medium | 488 | 95 | ## Citation and Usage If you use this dataset in your research or applications, please cite it as: ``` @dataset{gretelai_gsm8k_synthetic, author = {Gretel AI}, title = {Synthetically Generated Math Word Problems Dataset (gsm8k) with enhanced diversity using Gretel Navigator and meta-llama/Meta-Llama-3.1-8B}, year = {2024}, month = {9}, publisher = {Gretel}, howpublished = {https://huggingface.co/gretelai/gsm8k-synthetic-diverse-8b}, } ``` For questions, issues, or additional information, please visit the dataset repository on Hugging Face or contact Gretel AI.

# gretelai/gsm8k-synthetic-diverse-8b 本数据集是受GSM8K(https://huggingface.co/datasets/openai/gsm8k)数据集启发而合成生成的,完全依托**Gretel Navigator结合meta-llama/Meta-Llama-3.1-8B**作为代理大语言模型(Large Language Model,简称LLM)创建而成。数据集包含约1500道小学水平数学应用题及分步解题过程,着重关注年龄组别、难度与领域多样性。 ## 核心特性 - 合成生成:本数据集的数学题目通过Gretel Navigator生成,采用进化式技术、LLM作为评判器,并通过`sympy`库对标注的计算过程进行验证。 - 分层测试集:包含300条测试样本,其余样本用于训练,按主题与难度进行分层采样。 - 多样化场景与命名:题目涵盖广泛的真实世界场景,并包含多元的姓名与族裔设定。 - 年龄组标注:每道题目均标注了对应的年龄组别(2至6年级)。 - 难度分类:题目被划分为简单、中等、困难三个等级。 - 拓展领域:涵盖基础代数、几何学等多类主题。 - 分步解题过程:包含清晰的推理逻辑与标注的算术运算步骤。 ## 数据集统计与分布 ![meta-llama/Meta-Llama-3.1-8B 数据集分布](images/gsm8k-synthetic-diverse-8b_analysis.png) ## Gretel Navigator(选用模型:meta-llama/Meta-Llama-3.1-8B)数据集分布分析 ### 主题分布 | 主题 | 训练集 | 测试集 | |:-----------------------|--------:|-------:| | 算术运算 | 193 | 38 | | 基础代数 | 179 | 35 | | 数据解读 | 202 | 40 | | 分数 | 181 | 35 | | 几何学 | 171 | 33 | | 百分比 | 203 | 41 | | 比率与比例 | 201 | 39 | | 应用题 | 198 | 39 | ### 难度分布 | 难度 | 训练集 | 测试集 | |:-------------|--------:|-------:| | 简单 | 531 | 104 | | 困难 | 509 | 101 | | 中等 | 488 | 95 | ## 引用与使用须知 若您在研究或应用中使用本数据集,请按照以下格式引用: @dataset{gretelai_gsm8k_synthetic, author = {Gretel AI}, title = {Synthetically Generated Math Word Problems Dataset (gsm8k) with enhanced diversity using Gretel Navigator and meta-llama/Meta-Llama-3.1-8B}, year = {2024}, month = {9}, publisher = {Gretel}, howpublished = {https://huggingface.co/gretelai/gsm8k-synthetic-diverse-8b}, } 如有疑问、问题或需要更多信息,请访问Hugging Face上的数据集仓库或联系Gretel AI.
提供机构:
maas
创建时间:
2025-05-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作