aakashMeghwar01/Sindhi-Intelligence-Core-SFT
收藏Hugging Face2026-03-05 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/aakashMeghwar01/Sindhi-Intelligence-Core-SFT
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- sd
tags:
- sindhi
- sft
- instruct
- linguistics
- logic
pretty_name: Sindhi Intelligence Core SFT
size_categories:
- 100K<n<1M
task_categories:
- text-generation
- question-answering
configs:
- config_name: default
data_files:
- split: train
path: data/train.jsonl
---
# 🧠 Sindhi Intelligence Core SFT
This is a premium, high-density instruction dataset designed for training Large Language Models (LLMs) to master the Sindhi language. With **361,225 rows**, it provides a robust foundation for grammar, factual knowledge, and logical reasoning.
## 📊 Dataset Summary
This dataset was created by consolidating multiple high-quality Sindhi corpora into a unified **ChatML** format. It is specifically optimized for **Supervised Fine-Tuning (SFT)**.
## 📁 Data Structure
- **Format:** JSON Lines (.jsonl)
- **Total Rows:** 361,225
## 🔍 Data Sources & Credits
Special thanks to the researchers whose work forms the core of this dataset:
- **AMBILE Team**: Sindhi WordNet (Grammar & Gender)
- **Danish Mahdi**: Encyclopedia Sindhiana, Legal Dataset, & News Corpora
- **Owais Raza**: Daily Kawish Articles & Linguistic processing
## 📜 Citation
If you use this dataset, please cite:
https://huggingface.co/datasets/aakashMeghwar01/Sindhi-Intelligence-Core-SFT
提供机构:
aakashMeghwar01



