five

Nirav-Madhani/meditation-math-seeds

收藏
Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Nirav-Madhani/meditation-math-seeds
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation language: - en tags: - math - meditation - reinforcement-learning - GRPO - seed-data size_categories: - 1K<n<10K --- # Meditation Math Seeds Training data for the **Meditation** project: self-supervised mathematical introspection as a training phase for language models. ## Dataset Structure | File | Records | Description | |------|---------|-------------| | topics.json | 188 | Math topics across 3 tiers with references | | seed_meditations.jsonl | 1,504 | Raw meditations generated by Gemini 3.1 Flash Lite | | seeds_filtered.jsonl | 601 | Quality-filtered meditations (40% pass rate) | | sft_formatted.jsonl | 601 | Chat-formatted for SFT training | ## Topic Tiers - **Tier 1** (50%): Fully verifiable with sympy (arithmetic, modular, combinatorics, series, polynomials) - **Tier 2** (30%): Partially verifiable (calculus, linear algebra, number theory, optimization) - **Tier 3** (20%): Judge-evaluated (abstract algebra, real analysis, topology, competition math) ## Quality Filters - Word count: 80-800 words - Math character density >= 15 - Self-posed problem detected - Sympy verification: no failed claims (tier 1-2) - Minimum verified claims: tier 1 >= 2, tier 2 >= 1 ## Citation See the [paper](https://github.com/Nirav-Madhani/MeditationLearning) for details.
提供机构:
Nirav-Madhani
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作