ogulcanaydogan/Turkish-LLM-v10-Training
收藏Hugging Face2026-03-12 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ogulcanaydogan/Turkish-LLM-v10-Training
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- tr
license: apache-2.0
size_categories:
- 100K<n<1M
task_categories:
- text-generation
tags:
- turkish
- sft
- instruction-tuning
- low-resource
- nlp
pretty_name: Turkish LLM Training Dataset v10
---
# Turkish LLM Training Dataset v10
A curated corpus of **144,022 Turkish instruction-completion pairs** used to train the [Turkish LLM Family](https://huggingface.co/collections/ogulcanaydogan/turkish-llm-family-69b303b4ef1c36caffca4e94).
## Dataset Description
This dataset was created to address the scarcity of high-quality Turkish instruction-following data for language model fine-tuning. It covers a broad range of topics including:
- **Science & Technology** (physics, chemistry, biology, computer science)
- **History & Geography** (Turkish and world history, geography)
- **General Knowledge** (culture, society, daily life)
- **Mathematics** (arithmetic, algebra, problem-solving)
- **Language & Literature** (Turkish grammar, literature analysis)
## Dataset Structure
| Field | Type | Description |
|-------|------|-------------|
| `prompt` | string | The instruction or question in Turkish |
| `completion` | string | The expected response in Turkish |
### Statistics
| Metric | Value |
|--------|-------|
| Total examples | 144,022 |
| Format | JSONL |
| Language | Turkish (tr) |
| Average prompt length | ~50 tokens |
| Average completion length | ~150 tokens |
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("ogulcanaydogan/Turkish-LLM-v10-Training")
print(dataset["train"][0])
# {'prompt': 'Çocuklar kaç süt dişi kaybeder?', 'completion': 'Çocuklar büyüdükçe 20 süt dişini kaybederler...'}
```
## Models Trained on This Data
- [Turkish-LLM-14B-Instruct](https://huggingface.co/ogulcanaydogan/Turkish-LLM-14B-Instruct) — 14.7B parameters
- [Turkish-LLM-7B-Instruct](https://huggingface.co/ogulcanaydogan/Turkish-LLM-7B-Instruct) — 7B parameters
## License
Apache 2.0
## Citation
```bibtex
@misc{aydogan2026turkishllm,
title={Turkish LLM Family: Open-Source Turkish Language Models},
author={Aydogan, Ogulcan},
year={2026},
url={https://huggingface.co/ogulcanaydogan/Turkish-LLM-14B-Instruct}
}
```
---
# Türkçe
## Türkçe LLM Eğitim Verisi v10
[Turkish LLM Ailesi](https://huggingface.co/collections/ogulcanaydogan/turkish-llm-family-69b303b4ef1c36caffca4e94) modellerinin eğitiminde kullanılan **144.022 Türkçe talimat-cevap çifti** içeren küratörlü bir veri seti.
### Kapsam
- Fen ve teknoloji, tarih ve coğrafya, genel kültür, matematik, dil ve edebiyat konularını kapsar.
### Kullanım
```python
from datasets import load_dataset
dataset = load_dataset("ogulcanaydogan/Turkish-LLM-v10-Training")
```
提供机构:
ogulcanaydogan



