five

gretelai/gretel-math-gsm8k-v0

收藏
Hugging Face2024-09-06 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/gretelai/gretel-math-gsm8k-v0
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: llama3.1 multilinguality: monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - question-answering task_ids: - closed-domain-qa paperswithcode_id: gsm8k --- # gretelai/gsm8k-synthetic-diverse-405b This dataset is a synthetically generated version inspired by the GSM8K `https://huggingface.co/datasets/openai/gsm8k` dataset, created entirely using **Gretel Navigator with meta-llama/Meta-Llama-3.1-405B** as the agent LLM. It contains ~1500 Grade School-level math word problems with step-by-step solutions, focusing on age group, difficulty, and domain diversity. ## Key Features: - Synthetically Generated: Math problems created using Gretel Navigator, employing evolutionary techniques, LLM-as-a-judge, and verification of annotated calculations via the `sympy` library. - Stratified Test Set: 300 examples for test, remaining for training, stratified by topic and difficulty. - Diverse Contexts and Names: Problems feature a wide range of real-world contexts and include diverse names and ethnicities. - Age Group Labeling: Each problem is tagged with an appropriate age group (grades 2 through 6). - Difficulty Categorization: Problems are categorized as easy, medium, or hard. - Expanded Domains: Covers a wide range of topics including basic algebra, geometry, and more. - Step-by-Step Solutions: Clear reasoning with annotated arithmetic operations. ## Dataset Statistics and Distribution ![meta-llama/Meta-Llama-3.1-405B Dataset Distribution](images/gsm8k-synthetic-diverse-405b_analysis.png) ## Gretel Navigator (selected model: meta-llama/Meta-Llama-3.1-405B) Dataset - Distribution Analysis ### Topic Distribution | topic | Train | Test | |:-------------------------|--------:|-------:| | algebra | 25 | 20 | | arithmetic | 31 | 25 | | compound interest | 26 | 21 | | data interpretation | 27 | 20 | | exponential growth/decay | 25 | 21 | | fractions | 29 | 24 | | geometry | 35 | 29 | | optimization | 23 | 19 | | percentages | 37 | 29 | | polynomials | 21 | 18 | | probability | 20 | 17 | | proportions | 30 | 24 | | ratios | 41 | 33 | ### Difficulty Distribution | difficulty | Train | Test | |:-------------|--------:|-------:| | easy | 93 | 75 | | hard | 82 | 67 | | medium | 101 | 83 | | very hard | 94 | 75 | ## Citation and Usage If you use this dataset in your research or applications, please cite it as: ``` @dataset{gretelai_gsm8k_synthetic, author = {Gretel AI}, title = {Synthetically Generated Math Word Problems Dataset (gsm8k) with enhanced diversity using Gretel Navigator and meta-llama/Meta-Llama-3.1-405B}, year = {2024}, month = {9}, publisher = {Gretel}, howpublished = {https://huggingface.co/gretelai/gsm8k-synthetic-diverse-405b}, } ``` For questions, issues, or additional information, please visit the dataset repository on Hugging Face or contact Gretel AI.
提供机构:
gretelai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作