ogulcanaydogan/Turkish-LLM-v10-Training

Name: ogulcanaydogan/Turkish-LLM-v10-Training
Creator: ogulcanaydogan
Published: 2026-03-12 18:20:43
License: 暂无描述

Hugging Face2026-03-12 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/ogulcanaydogan/Turkish-LLM-v10-Training

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - tr license: apache-2.0 size_categories: - 100K<n<1M task_categories: - text-generation tags: - turkish - sft - instruction-tuning - low-resource - nlp pretty_name: Turkish LLM Training Dataset v10 --- # Turkish LLM Training Dataset v10 A curated corpus of **144,022 Turkish instruction-completion pairs** used to train the [Turkish LLM Family](https://huggingface.co/collections/ogulcanaydogan/turkish-llm-family-69b303b4ef1c36caffca4e94). ## Dataset Description This dataset was created to address the scarcity of high-quality Turkish instruction-following data for language model fine-tuning. It covers a broad range of topics including: - **Science & Technology** (physics, chemistry, biology, computer science) - **History & Geography** (Turkish and world history, geography) - **General Knowledge** (culture, society, daily life) - **Mathematics** (arithmetic, algebra, problem-solving) - **Language & Literature** (Turkish grammar, literature analysis) ## Dataset Structure | Field | Type | Description | |-------|------|-------------| | `prompt` | string | The instruction or question in Turkish | | `completion` | string | The expected response in Turkish | ### Statistics | Metric | Value | |--------|-------| | Total examples | 144,022 | | Format | JSONL | | Language | Turkish (tr) | | Average prompt length | ~50 tokens | | Average completion length | ~150 tokens | ## Usage ```python from datasets import load_dataset dataset = load_dataset("ogulcanaydogan/Turkish-LLM-v10-Training") print(dataset["train"][0]) # {'prompt': 'Çocuklar kaç süt dişi kaybeder?', 'completion': 'Çocuklar büyüdükçe 20 süt dişini kaybederler...'} ``` ## Models Trained on This Data - [Turkish-LLM-14B-Instruct](https://huggingface.co/ogulcanaydogan/Turkish-LLM-14B-Instruct) — 14.7B parameters - [Turkish-LLM-7B-Instruct](https://huggingface.co/ogulcanaydogan/Turkish-LLM-7B-Instruct) — 7B parameters ## License Apache 2.0 ## Citation ```bibtex @misc{aydogan2026turkishllm, title={Turkish LLM Family: Open-Source Turkish Language Models}, author={Aydogan, Ogulcan}, year={2026}, url={https://huggingface.co/ogulcanaydogan/Turkish-LLM-14B-Instruct} } ``` --- # Türkçe ## Türkçe LLM Eğitim Verisi v10 [Turkish LLM Ailesi](https://huggingface.co/collections/ogulcanaydogan/turkish-llm-family-69b303b4ef1c36caffca4e94) modellerinin eğitiminde kullanılan **144.022 Türkçe talimat-cevap çifti** içeren küratörlü bir veri seti. ### Kapsam - Fen ve teknoloji, tarih ve coğrafya, genel kültür, matematik, dil ve edebiyat konularını kapsar. ### Kullanım ```python from datasets import load_dataset dataset = load_dataset("ogulcanaydogan/Turkish-LLM-v10-Training") ```

提供机构：

ogulcanaydogan

5,000+

优质数据集

54 个

任务类型

进入经典数据集