open-multilang/gsm8k-106
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/open-multilang/gsm8k-106
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
---
# GSM8K-106
GSM8K-106 is a multilingual extension of GSM8K with translations of grade-school math word problems in 106 languages. It is intended for multilingual math reasoning and evaluation.
## Dataset Summary
- Languages: 106
- Splits: `train`, `test`
- Format: JSONL
- Source dataset: GSM8K
- Alignment key: `original_id`
## Dataset Structure
Each row is one GSM8K example in one language.
### Fields
- `id`: unique example id, e.g. `train_af_0`
- `original_id`: original GSM8K id
- `split`: `train` or `test`
- `language`: ISO language code
- `language_name`: language name
- `question`: translated question
- `answer`: translated answer with reasoning
- `answer_number`: final numeric answer
- `original_question`: original English question
- `original_answer`: original English answer with reasoning
- `original_answer_number`: final numeric answer in the original example
### Example
```json
{
"id": "train_af_0",
"original_id": 0,
"split": "train",
"language": "af",
"language_name": "Afrikaans",
"question": "Natalia het in April klippies aan 48 van haar vriende verkoop, en toe het sy in Mei half soveel klippies verkoop. Hoeveel klippies het Natalia altesaam in April en Mei verkoop?",
"answer": "Natalia het 48/2 = <<48/2=24>>24 knippe in Mei verkoop.\nNatalia het 48+24 = <<48+24=72>>72 knippe altesaam in April en Mei verkoop.\n#### 72",
"answer_number": 72,
"original_question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
"original_answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72",
"original_answer_number": 72
}
```
## Repository Structure
```json
data/
train/
af.jsonl
ar.jsonl
...
test/
af.jsonl
ar.jsonl
...
```
## Attribution
This dataset is based on the GSM8K dataset:
Cobbe et al., "Training Verifiers to Solve Math Word Problems", 2021.
We translate and extend the dataset into 106 languages for multilingual research.
## License
This dataset is released under the MIT License.
It is a derivative work based on the GSM8K dataset, which is also licensed under the MIT License.
提供机构:
open-multilang



