barandinho/turkish-math-rlvr
收藏Hugging Face2025-12-08 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/barandinho/turkish-math-rlvr
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: problem
dtype: string
- name: answer
dtype: string
- name: source
dtype: string
- name: original_idx
dtype: int64
- name: original_id
dtype: string
- name: level
dtype: float64
- name: subject
dtype: string
- name: pass_rate
dtype: float64
splits:
- name: train
num_bytes: 736919
num_examples: 1980
- name: test
num_bytes: 94244
num_examples: 220
download_size: 418983
dataset_size: 831163
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
# Turkish Math Reasoning Dataset for RLVR training
## What this dataset is
This dataset is a **Turkish math reasoning benchmark augmented with a weak-model pass-rate difficulty signal**, designed for **curriculum learning, GRPO / RLVR-style training, and evaluation of reasoning models** in Turkish math problems.
It is constructed by **merging two Turkish math datasets** and annotating each problem with a **pass rate computed by the `google/gemma-3-4b-it` model**.
Each example represents **one math problem in Turkish**, together with its ground-truth answer and a **difficulty proxy** (`pass_rate ∈ [0, 0.25, 0.5, 0.75, 1.0]`.
---
## Data sources
The dataset combines:
- **`barandinho/amc_turkish`**
Turkish translations of 1700 AMC competition math problems.
- **`bezir/MATH-500-multilingual` (Turkish split)**
Translated subset of the MATH benchmark covering algebra, geometry, precalculus, etc.
Both sources are **answer-verifiable symbolic math problems**.
---
## Pass rate: what it means
For each problem:
- A **weak Turkish-capable model** (`google/gemma-3-4b-it`) is run **4 independent times**
- Generation parameters:
- `temperature = 0.7`
- `top_k = 64`
- `top_p = 0.95`
- Same prompt, stochastic sampling
- Each generation is:
- Parsed
- Formally verified using `math_verify`
### `pass_rate` definition
```text
pass_rate = (# correct solutions) / 4
````
| pass_rate | Interpretation |
| --------: | ------------------------------------- |
| 0.00 | Model never solves it → **very hard** |
| 0.25–0.50 | Occasionally solved → **medium** |
| 0.75–1.00 | Almost always solved → **easy** |
This provides a **model-based difficulty estimate** without human annotation, suitable for:
* Curriculum filtering
* Hard-problem mining
* RL reward shaping
* Controlled difficulty splits
---
## Dataset splits
A **stratified train/test split** is used:
* Stratified jointly by:
* `source` (AMC vs MATH-500)
* `pass_rate` bucket (`hardest / hard / easy`)
* Preserves:
* Source proportions
* Difficulty distribution
| Split | Examples |
| ----- | -------- |
| Train | 1,980 |
| Test | 220 |
---
## Dataset structure
### Columns
| Field | Description |
| -------------- | ------------------------------ |
| `problem` | Math problem text (Turkish) |
| `answer` | Ground-truth answer |
| `source` | `amc_turkish` or `math_500_tr` |
| `original_idx` | Original dataset index |
| `original_id` | Stable ID from source dataset |
| `level` | Problem level (when available) |
| `subject` | Math category (when available) |
| `pass_rate` | Weak-model pass rate ∈ [0, 1] |
---
## Intended use
Designed for:
* RLVR training on math reasoning
* Curriculum learning by difficulty
* Hard-problem filtering (`pass_rate == 0`)
* Turkish math reasoning benchmark evaluation
* Weak-to-strong generalization experiments
Not intended for:
* General factual QA
* Non-math natural language tasks
* Multi-modal grounding
---
## Citation
If you use this dataset, please credit the original source datasets.
提供机构:
barandinho



