five

barandinho/turkish-math-rlvr

收藏
Hugging Face2025-12-08 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/barandinho/turkish-math-rlvr
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: problem dtype: string - name: answer dtype: string - name: source dtype: string - name: original_idx dtype: int64 - name: original_id dtype: string - name: level dtype: float64 - name: subject dtype: string - name: pass_rate dtype: float64 splits: - name: train num_bytes: 736919 num_examples: 1980 - name: test num_bytes: 94244 num_examples: 220 download_size: 418983 dataset_size: 831163 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* --- # Turkish Math Reasoning Dataset for RLVR training ## What this dataset is This dataset is a **Turkish math reasoning benchmark augmented with a weak-model pass-rate difficulty signal**, designed for **curriculum learning, GRPO / RLVR-style training, and evaluation of reasoning models** in Turkish math problems. It is constructed by **merging two Turkish math datasets** and annotating each problem with a **pass rate computed by the `google/gemma-3-4b-it` model**. Each example represents **one math problem in Turkish**, together with its ground-truth answer and a **difficulty proxy** (`pass_rate ∈ [0, 0.25, 0.5, 0.75, 1.0]`. --- ## Data sources The dataset combines: - **`barandinho/amc_turkish`** Turkish translations of 1700 AMC competition math problems. - **`bezir/MATH-500-multilingual` (Turkish split)** Translated subset of the MATH benchmark covering algebra, geometry, precalculus, etc. Both sources are **answer-verifiable symbolic math problems**. --- ## Pass rate: what it means For each problem: - A **weak Turkish-capable model** (`google/gemma-3-4b-it`) is run **4 independent times** - Generation parameters: - `temperature = 0.7` - `top_k = 64` - `top_p = 0.95` - Same prompt, stochastic sampling - Each generation is: - Parsed - Formally verified using `math_verify` ### `pass_rate` definition ```text pass_rate = (# correct solutions) / 4 ```` | pass_rate | Interpretation | | --------: | ------------------------------------- | | 0.00 | Model never solves it → **very hard** | | 0.25–0.50 | Occasionally solved → **medium** | | 0.75–1.00 | Almost always solved → **easy** | This provides a **model-based difficulty estimate** without human annotation, suitable for: * Curriculum filtering * Hard-problem mining * RL reward shaping * Controlled difficulty splits --- ## Dataset splits A **stratified train/test split** is used: * Stratified jointly by: * `source` (AMC vs MATH-500) * `pass_rate` bucket (`hardest / hard / easy`) * Preserves: * Source proportions * Difficulty distribution | Split | Examples | | ----- | -------- | | Train | 1,980 | | Test | 220 | --- ## Dataset structure ### Columns | Field | Description | | -------------- | ------------------------------ | | `problem` | Math problem text (Turkish) | | `answer` | Ground-truth answer | | `source` | `amc_turkish` or `math_500_tr` | | `original_idx` | Original dataset index | | `original_id` | Stable ID from source dataset | | `level` | Problem level (when available) | | `subject` | Math category (when available) | | `pass_rate` | Weak-model pass rate ∈ [0, 1] | --- ## Intended use Designed for: * RLVR training on math reasoning * Curriculum learning by difficulty * Hard-problem filtering (`pass_rate == 0`) * Turkish math reasoning benchmark evaluation * Weak-to-strong generalization experiments Not intended for: * General factual QA * Non-math natural language tasks * Multi-modal grounding --- ## Citation If you use this dataset, please credit the original source datasets.
提供机构:
barandinho
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作