Zyroxx66/Somali-Somlish-Instruct-2K-Dataset
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Zyroxx66/Somali-Somlish-Instruct-2K-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- so
- en
license: apache-2.0
size_categories:
- 1K<n<10K
task_categories:
- text-generation
- question-answering
tags:
- somali
- somlish
- instruct
- tech
- discord
---
# Somlish-Tech-Instruct-2K
This is the first-of-its-kind **Somlish** (Somali + English) instruction-tuning dataset. It contains 2,312 rows of high-quality synthetic data generated to teach AI models how to speak like a modern Somali tech enthusiast.
## 🌟 Why this exists
Standard Somali datasets are often too formal. This dataset uses natural "Discord-style" slang (Niyo, Sxb, Bro) while maintaining English technical terms (API, GPU, React) to ensure the AI stays smart and logical.
## 📊 Dataset Structure
Each row follows the standard Instruction/Output format:
- **Instruction:** A technical question or request in Somlish or English.
- **Output:** A helpful, brotherly response in Somlish.
## 🚀 Recommended Use
Perfect for fine-tuning small models (2B - 7B) like Qwen, Gemma, and Phi-3 to create Somali-language chatbots.
Created by: **Zyroxx66**
提供机构:
Zyroxx66



