five

Zyroxx66/Somali-Reasoning-Dataset

收藏
Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Zyroxx66/Somali-Reasoning-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - so - en license: apache-2.0 size_categories: - 10K<n<100K task_categories: - text-generation - question-answering tags: - somali - somlish - openhermes - logic - instruct - bilingual --- # Somali-OpenHermes-Somlish-Instruct-20K 🇸🇴 This dataset is a gift to the Somali AI community. It is designed to help developers build models that are both highly intelligent and naturally conversational in our language. ## 🌟 What makes this unique? This is a **Hybrid Dataset** that combines two powerful sources: 1. **The Logic (18,379 rows):** A Somali translation of the world-class **[teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)**. This part provides the AI with deep reasoning, mathematics, coding, and science capabilities. (Translated using NLLB-200). 2. **The Personality (2,312 rows):** A hand-curated collection of **Somlish** (Somali + English) technical instructions. **[Zyroxx/Somali-Somalish](https://huggingface.co/datasets/Zyroxx66/Somali-Somlish-Instruct-2K-Dataset)**(I created specifically for this project). This part teaches the model how to speak naturally like a modern Somali tech enthusiast (using terms like Niyo, Sxb, and Bro). ## 📊 Dataset Features - **Bilingual Logic:** Technical English terms (API, Loop, GPU, React) are preserved within Somali sentences to maintain accuracy and prevent model "hallucinations." - **Comprehensive Coverage:** From solving complex math word problems to explaining how to set up a Discord bot or a CSS layout. - **Natural Tone:** Unlike standard formal translations, this dataset sounds like a real person. ## 🙏 Credits & Appreciation - **Source Logic:** Huge thanks to Teknium and the OpenHermes team for the original English data. - **Translation:** Powered by Meta's NLLB-200. - **Curated Somlish:** Created with love for the Somali people by **Zyroxx66**. ## 🚀 Usage Note This dataset is perfect for fine-tuning models from 2B to 70B parameters. By using this combined set, your model will gain the intelligence of OpenHermes and the natural voice of the Somali tech community. **License:** Apache 2.0 (Open and free for everyone to use, modify, and build upon).
提供机构:
Zyroxx66
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作