five

BrewInteractive/alpaca-tr

收藏
Hugging Face2024-09-04 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/BrewInteractive/alpaca-tr
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - tr size_categories: - 10K<n<100K --- ## Dataset Description **Dataset Name:** Turkish Alpaca Dataset **Languages:** Turkish **Data Source:** - Original dataset created from Alpaca GPT-4. - Translated and localized using Gemini Flash. - Most generated texts from the Alpaca dataset were modified, retaining only the instructions. - Cleaned to remove non-Turkish texts. - Irrelevant contexts were also cleaned. ### Dataset Summary This dataset includes a series of text instructions originally sourced from the Alpaca GPT-4 dataset. The instructions were translated and localized into Turkish using Gemini Flash, with significant modifications made to the generated texts. The dataset emphasizes contextually relevant and locale-specific information, suitable for fine-tuning small language models to be aware of and responsive to Turkish language and cultural contexts. ## Motivation ### Why We Created This Dataset Our goal is to develop, fine-tune (and debug?) small language models that are both contextually and locale-aware. 1. **Localized Training Data:** Essential for models intended for Turkish language applications. 2. **Context-Aware Examples:** Ensures that the models are capable of understanding and generating contextually relevant responses. 3. **Evaluation Resource:** Aids in testing and validating the performance of smaller models in terms of handling Turkish instructions effectively. ### Intended Uses This dataset is primarily intended for fine-tuning and testing small-scale language models to: - Improve Turkish language understanding. - Generate Turkish text that is contextually accurate and culturally relevant. - Benchmark performance in localized NLP tasks. ## Data Collection and Processing This dataset will be updated/cleaned regularly to ensure ongoing relevance and improved coverage of Turkish language instructions. ### Data Sources - Alpaca GPT-4: Original source for instructions. - Gemini Flash: Used for translation and localization processes. **Dataset Curator:** Brew Interactive/AI Guild https://brewww.com
提供机构:
BrewInteractive
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作