taesiri/TinyStories-Farsi
收藏Hugging Face2024-02-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/taesiri/TinyStories-Farsi
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cdla-sharing-1.0
task_categories:
- text-generation
- text2text-generation
language:
- fa
- en
tags:
- Persian
- Farsi
- English2Farsi
- Farsi2English
pretty_name: Tiny Stories - Farsi
size_categories:
- 100K<n<1M
---
# Tiny Stories Farsi
The _Tiny Stories Farsi_ project is a continuous effort to translate the [Tiny Stories dataset](https://huggingface.co/datasets/roneneldan/TinyStories) into the Persian (Farsi) language. The primary goal is to produce a high-quality Farsi dataset, maintaining equivalency with the original English version, and subsequently to utilize it for training language models in Farsi. This seeks to affirm that the advancements and trends observed in English language models are replicable and applicable in other languages. Thus far, the project has translated over 27,000 entries from the validation set, originally created by `GPT-4`, into Farsi, using the `Claude-2.0` language model for the translation process. The project remains active and welcomes ongoing contributions and collaborative efforts towards the enrichment of non-English language data in the realm of machine learning and artificial intelligence.
Original paper: [TinyStories: How Small Can Language Models Be and Still Speak Coherent English?](https://arxiv.org/abs/2305.07759)
# Acknowledgements
This project is made possible through the generous support of [Anthropic](https://www.anthropic.com/), who provided free access to the `Claude-2.0` API.
提供机构:
taesiri
原始信息汇总
数据集概述
基本信息
- 许可证: cdla-sharing-1.0
- 任务类别:
- 文本生成
- 文本到文本生成
- 语言:
- 波斯语 (fa)
- 英语 (en)
- 标签:
- 波斯语
- 法尔西语
- 英语到法尔西语
- 法尔西语到英语
- 大小类别: 100K<n<1M
数据集详情
- 名称: Tiny Stories - Farsi
- 描述: 该项目旨在将Tiny Stories数据集翻译成波斯语(法尔西语),目标是生产高质量的法尔西语数据集,保持与原始英语版本的等效性,并用于训练法尔西语语言模型。目前,项目已将27,000多条来自验证集的条目从英语翻译成法尔西语,使用
Claude-2.0语言模型进行翻译。



