five

Omartificial-Intelligence-Space/Arabic-Math-SFT

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Omartificial-Intelligence-Space/Arabic-Math-SFT
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ar license: apache-2.0 size_categories: - 1K<n<10K task_categories: - visual-question-answering - image-to-text tags: - math - geometry - arabic - multimodal - education - stem - visual-math - sft pretty_name: Arabic Math SFT dataset_info: features: - name: image dtype: image - name: problem dtype: string - name: solution dtype: string splits: - name: train num_examples: 5000 --- <div align="center"> # Arabic Math SFT ### A Multimodal Arabic Mathematics Dataset for Supervised Fine-Tuning [![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0) [![Language](https://img.shields.io/badge/Language-Arabic-blue.svg)](#) [![Tasks](https://img.shields.io/badge/Task-Visual_Math_QA-orange.svg)](#) [![Size](https://img.shields.io/badge/Samples-5,000-purple.svg)](#) --- *Empowering Arabic AI with visual mathematical reasoning* </div> ## Overview **Arabic-Math-SFT** is a curated multimodal dataset designed for supervised fine-tuning of vision-language models on **mathematical problem-solving in Arabic**. Each sample pairs a geometric or algebraic diagram with an Arabic-language problem statement and its corresponding solution. This dataset bridges a critical gap in Arabic AI resources — while math reasoning benchmarks exist for English and Chinese, high-quality Arabic visual math datasets remain scarce. ## Dataset Structure | Column | Type | Description | |:---------|:---------|:----------------------------------------------| | `image` | `Image` | Mathematical diagram (geometry, graphs, etc.) | | `problem`| `string` | Problem statement in Arabic | | `solution`| `string`| Answer wrapped in `<answer>` tags | ## Examples <table> <tr> <td width="50%"> **Example 1** > **المسألة:** إذا كانت زاوية ABC تساوي 25 درجة والنقاط A و B و C تقع جميعها على الدائرة O، فما قياس زاوية AOC؟ > > **الحل:** `<answer> 50° </answer>` </td> <td width="50%"> **Example 2** > **المسألة:** في الرسم المقدم، مع متوازي الأضلاع ABCD، الخط AE يقسم زاوية BAD ويقطع CD عند النقطة E. إذا كانت أطوال AD و AB هي 3.0 و 4.0 على التوالي، ما هو قياس EC؟ > > **الحل:** `<answer> 1 </answer>` </td> </tr> </table> ## Statistics | Metric | Value | |:------|:------| | Total Samples | 5,000 | | Avg Problem Length | ~127 chars | | Min Problem Length | 5 chars | | Max Problem Length | 476 chars | | Samples with Images | 100% | ## Topics Covered The dataset spans a wide range of mathematical topics typically found in middle and high school curricula: - **Geometry** — triangles, circles, parallel lines, angles, polygons - **Measurement** — area, perimeter, arc length, volume - **Algebra** — solving for unknowns from geometric relationships - **Trigonometry** — sine, cosine, tangent in geometric contexts - **Coordinate Geometry** — points, distances, slopes ## Usage ```python from datasets import load_dataset ds = load_dataset("Omartificial-Intelligence-Space/Arabic-Math-SFT", split="train") sample = ds[0] sample["image"].show() print(sample["problem"]) # Arabic problem text print(sample["solution"]) # Answer ``` ## Intended Use - **SFT** for Arabic multimodal LLMs (math reasoning) - **Benchmarking** Arabic vision-language models on STEM tasks - **Educational tools** for Arabic-speaking students - **Research** in cross-lingual mathematical reasoning ## Source & Construction This dataset was constructed by sampling **5,000** examples from [`kolerk/TON-Math-SFT`](https://huggingface.co/datasets/kolerk/TON-Math-SFT) (8,031 total) ## Citation ```bibtex @dataset{arabic_math_sft_2026, title={Arabic-Math-SFT: A Multimodal Arabic Mathematics Dataset}, author={Omartificial Intelligence Space}, year={2026}, publisher={Hugging Face}, url={https://huggingface.co/datasets/Omartificial-Intelligence-Space/Arabic-Math-SFT} } ``` --- <div align="center"> <b>Built with by <a href="https://huggingface.co/Omartificial-Intelligence-Space">Omartificial Intelligence Space</a></b> </div>
提供机构:
Omartificial-Intelligence-Space
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作