Zyroxx66/Somali-Reasoning-Dataset
收藏Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Zyroxx66/Somali-Reasoning-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- so
- en
license: apache-2.0
size_categories:
- 10K<n<100K
task_categories:
- text-generation
- question-answering
tags:
- somali
- somlish
- openhermes
- logic
- instruct
- bilingual
---
# Somali-OpenHermes-Somlish-Instruct-20K 🇸🇴
This dataset is a gift to the Somali AI community. It is designed to help developers build models that are both highly intelligent and naturally conversational in our language.
## 🌟 What makes this unique?
This is a **Hybrid Dataset** that combines two powerful sources:
1. **The Logic (18,379 rows):** A Somali translation of the world-class **[teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)**. This part provides the AI with deep reasoning, mathematics, coding, and science capabilities. (Translated using NLLB-200).
2. **The Personality (2,312 rows):** A hand-curated collection of **Somlish** (Somali + English) technical instructions. **[Zyroxx/Somali-Somalish](https://huggingface.co/datasets/Zyroxx66/Somali-Somlish-Instruct-2K-Dataset)**(I created specifically for this project). This part teaches the model how to speak naturally like a modern Somali tech enthusiast (using terms like Niyo, Sxb, and Bro).
## 📊 Dataset Features
- **Bilingual Logic:** Technical English terms (API, Loop, GPU, React) are preserved within Somali sentences to maintain accuracy and prevent model "hallucinations."
- **Comprehensive Coverage:** From solving complex math word problems to explaining how to set up a Discord bot or a CSS layout.
- **Natural Tone:** Unlike standard formal translations, this dataset sounds like a real person.
## 🙏 Credits & Appreciation
- **Source Logic:** Huge thanks to Teknium and the OpenHermes team for the original English data.
- **Translation:** Powered by Meta's NLLB-200.
- **Curated Somlish:** Created with love for the Somali people by **Zyroxx66**.
## 🚀 Usage Note
This dataset is perfect for fine-tuning models from 2B to 70B parameters. By using this combined set, your model will gain the intelligence of OpenHermes and the natural voice of the Somali tech community.
**License:** Apache 2.0 (Open and free for everyone to use, modify, and build upon).
提供机构:
Zyroxx66



