five

razielAI/HE-Math

收藏
Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/razielAI/HE-Math
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - he tags: - math --- # Orion-1 Math Dataset: K-12 Hebrew Educational Corpus ## Overview The **Orion-1 Math Dataset** is a comprehensive, high-quality synthetic and curated dataset specifically engineered for training Large Language Models (LLMs) in the field of mathematics education. It is the core data foundation for **Orion-1**, a Socratic Hebrew-native model developed by **TopAI Projects**. Unlike generic math datasets, Orion-1 is built to reflect the Israeli Ministry of Education's curriculum, focusing on pedagogical progression from elementary school to advanced high school levels. ## Dataset Structure The dataset is meticulously organized into **12 distinct files**, each representing a full academic year in the Israeli K-12 system: - **Elementary School (Grades 1-6):** Focuses on foundational arithmetic, number sense, the decimal system, basic geometry, fractions, and decimals. - **Middle School (Grades 7-9):** Introduces algebraic thinking, functions, linear equations, basic statistics, and spatial geometry. - **High School (Grades 10-12):** Advanced mathematics including Calculus (differentiation and integration), Trigonometry, Vectors, Complex Numbers, and Probability, tailored for the "Bagrut" (Matriculation) exams. ## Data Format & Schema To ensure optimal SFT (Supervised Fine-Tuning) performance, every data entry follows a strict Socratic structure. This forces the model to learn reasoning and guidance rather than just providing immediate answers. ### The Orion Block Format: - **Topic:** The specific mathematical concept. - **Definition:** A simplified, age-appropriate explanation. - **Logical Explanation:** Intuitive analogies (e.g., using money, shapes, or real-world objects). - **Solved Example:** A step-by-step solution rendered in full LaTeX. - **Common Error Analysis:** Identification of frequent student misconceptions and why they occur. - **Orion Tutor Dialogue:** A Socratic interaction where the AI guides a "stuck" student through hints and leading questions without revealing the solution. ## Technical Specifications - **Language:** Native Hebrew (Israeli dialect). - **Mathematical Notation:** Standard LaTeX throughout all files for seamless rendering in modern UI/UX environments. - **Total Files:** 12 (.txt or .jsonl format). - **Pedagogical Goal:** To reduce "answer-grabbing" behavior and promote "thinking-native" AI interactions in Hebrew. ## Mission Statement The goal of this dataset is to empower the **Duchifat model series** to become the most advanced educational AI in Israel. By providing a structured path from Grade 1 to Grade 12, Orion-1 ensures that the AI understands the complexity of math as it evolves through a student's life. --- **Developed and curated by TopAI Projects.** **Status: Coming Soon to the Hugging Face Community.**
提供机构:
razielAI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作