five

open-multilang/gsm8k-106

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/open-multilang/gsm8k-106
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit --- # GSM8K-106 GSM8K-106 is a multilingual extension of GSM8K with translations of grade-school math word problems in 106 languages. It is intended for multilingual math reasoning and evaluation. ## Dataset Summary - Languages: 106 - Splits: `train`, `test` - Format: JSONL - Source dataset: GSM8K - Alignment key: `original_id` ## Dataset Structure Each row is one GSM8K example in one language. ### Fields - `id`: unique example id, e.g. `train_af_0` - `original_id`: original GSM8K id - `split`: `train` or `test` - `language`: ISO language code - `language_name`: language name - `question`: translated question - `answer`: translated answer with reasoning - `answer_number`: final numeric answer - `original_question`: original English question - `original_answer`: original English answer with reasoning - `original_answer_number`: final numeric answer in the original example ### Example ```json { "id": "train_af_0", "original_id": 0, "split": "train", "language": "af", "language_name": "Afrikaans", "question": "Natalia het in April klippies aan 48 van haar vriende verkoop, en toe het sy in Mei half soveel klippies verkoop. Hoeveel klippies het Natalia altesaam in April en Mei verkoop?", "answer": "Natalia het 48/2 = <<48/2=24>>24 knippe in Mei verkoop.\nNatalia het 48+24 = <<48+24=72>>72 knippe altesaam in April en Mei verkoop.\n#### 72", "answer_number": 72, "original_question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?", "original_answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72", "original_answer_number": 72 } ``` ## Repository Structure ```json data/ train/ af.jsonl ar.jsonl ... test/ af.jsonl ar.jsonl ... ``` ## Attribution This dataset is based on the GSM8K dataset: Cobbe et al., "Training Verifiers to Solve Math Word Problems", 2021. We translate and extend the dataset into 106 languages for multilingual research. ## License This dataset is released under the MIT License. It is a derivative work based on the GSM8K dataset, which is also licensed under the MIT License.
提供机构:
open-multilang
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作