five

yashmarathe/eka-rl

收藏
Hugging Face2026-03-30 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/yashmarathe/eka-rl
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en tags: - math - reasoning - reinforcement-learning - grpo - eka size_categories: - 100K<n<1M dataset_info: features: - name: problem dtype: string - name: answer dtype: string - name: domain dtype: string - name: solve_rate dtype: float64 splits: - name: train num_bytes: 74008065 num_examples: 251122 download_size: 44604381 dataset_size: 74008065 configs: - config_name: default data_files: - split: train path: data/train-* --- # eka-rl A dataset of **251,122 verified math problems** designed for reinforcement learning training of math-reasoning language models. Each problem has a verified correct answer, enabling straightforward binary outcome rewards (correct / wrong) without a process reward model or verifier LLM. Used with **[eka-finetune](https://github.com/yash-marathe/eka-mono/tree/main/packages/eka-finetune)** to train models via GRPO (Group Relative Policy Optimisation). --- ## Dataset Summary | Property | Value | |---|---| | Problems | 251,122 | | Split | `train` only | | Answer format | Plain text / LaTeX | | Domains | Algebra, Geometry, Number Theory, Combinatorics, Calculus, and more | | Difficulty range | `solve_rate` 0.0 (hardest) → 1.0 (easiest) | --- ## Columns | Column | Type | Description | |---|---|---| | `problem` | `string` | Math problem statement in LaTeX | | `answer` | `string` | Verified ground-truth answer | | `domain` | `string` | Math domain tags (e.g. `['Algebra -> Equations']`) | | `solve_rate` | `float64` | Fraction of model attempts that are correct — proxy for problem difficulty | --- ## Usage ### Load the full dataset ```python from datasets import load_dataset ds = load_dataset("yashmarathe/eka-rl", split="train") print(ds[0]) # { # "problem": "Find all integers n such that n^2 + 3 is divisible by 7.", # "answer": "n ≡ 2 or 5 (mod 7)", # "domain": "['Mathematics -> Number Theory -> Congruences']", # "solve_rate": 0.25 # } ``` ### Filter by difficulty `solve_rate` is a useful curriculum signal — start with easier problems and gradually increase difficulty: ```python from datasets import load_dataset ds = load_dataset("yashmarathe/eka-rl", split="train") # Medium difficulty (20–60% solve rate) medium = ds.filter(lambda x: 0.2 <= float(x["solve_rate"]) <= 0.6) # Hard only (< 30% solve rate) hard = ds.filter(lambda x: float(x["solve_rate"]) < 0.3) print(f"Medium: {len(medium)} problems") print(f"Hard: {len(hard)} problems") ``` ### Use with eka-finetune GRPO training This dataset is the default for `train_grpo.py` in [eka-finetune](https://github.com/yash-marathe/eka-mono/tree/main/packages/eka-finetune): ```bash # configs/rl_config.yaml dataset_name: "yashmarathe/eka-rl" num_samples: 50000 min_solve_rate: 0.0 max_solve_rate: 0.8 # exclude trivially easy problems # Run GRPO training python3 train_grpo.py --config configs/rl_config.yaml ``` --- ## Difficulty Distribution The `solve_rate` column reflects empirical difficulty — problems with a low solve rate are harder: | Difficulty | Solve rate range | Approx. problems | |---|---|---| | Very Hard | 0.0 – 0.1 | ~40K | | Hard | 0.1 – 0.3 | ~70K | | Medium | 0.3 – 0.6 | ~80K | | Easy | 0.6 – 0.8 | ~40K | | Trivial | 0.8 – 1.0 | ~20K | For RL training, filtering to `max_solve_rate=0.8` removes trivially easy problems that provide no learning signal (the model already solves them correctly, so the reward is constant across all rollouts). --- ## Answer Verification Answers can be verified symbolically using `sympy`: ```python import sympy from sympy.parsing.latex import parse_latex def verify(predicted: str, ground_truth: str) -> bool: try: diff = sympy.simplify(parse_latex(predicted) - parse_latex(ground_truth)) return diff == 0 except Exception: # Fall back to normalised string match import re norm = lambda s: re.sub(r"\s+", "", s).lower() return norm(predicted) == norm(ground_truth) ``` This is exactly the verification logic used in `train_grpo.py`'s correctness reward. --- ## Domains Problems span the full range of competition and undergraduate mathematics: - **Algebra** — equations, inequalities, polynomials, sequences - **Geometry** — Euclidean, coordinate, trigonometry - **Number Theory** — divisibility, congruences, primes - **Combinatorics** — counting, probability, graph theory - **Calculus** — limits, derivatives, integrals, series - **Linear Algebra** — matrices, eigenvalues, vector spaces --- ## Related - **Model:** [yashmarathe/Eka-4B](https://huggingface.co/yashmarathe/Eka-4B) — 4B reasoning model trained with this dataset - **Training code:** [eka-finetune](https://github.com/yash-marathe/eka-mono/tree/main/packages/eka-finetune) — GRPO training script (`train_grpo.py`) - **SFT dataset:** [yashmarathe/OpenMathReasoning](https://huggingface.co/datasets/yashmarathe/OpenMathReasoning) — 1M CoT samples for supervised fine-tuning --- ## License Apache 2.0
提供机构:
yashmarathe
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作