five

taesiri/TinyStories-Farsi

收藏
Hugging Face2024-02-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/taesiri/TinyStories-Farsi
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cdla-sharing-1.0 task_categories: - text-generation - text2text-generation language: - fa - en tags: - Persian - Farsi - English2Farsi - Farsi2English pretty_name: Tiny Stories - Farsi size_categories: - 100K<n<1M --- # Tiny Stories Farsi The _Tiny Stories Farsi_ project is a continuous effort to translate the [Tiny Stories dataset](https://huggingface.co/datasets/roneneldan/TinyStories) into the Persian (Farsi) language. The primary goal is to produce a high-quality Farsi dataset, maintaining equivalency with the original English version, and subsequently to utilize it for training language models in Farsi. This seeks to affirm that the advancements and trends observed in English language models are replicable and applicable in other languages. Thus far, the project has translated over 27,000 entries from the validation set, originally created by `GPT-4`, into Farsi, using the `Claude-2.0` language model for the translation process. The project remains active and welcomes ongoing contributions and collaborative efforts towards the enrichment of non-English language data in the realm of machine learning and artificial intelligence. Original paper: [TinyStories: How Small Can Language Models Be and Still Speak Coherent English?](https://arxiv.org/abs/2305.07759) # Acknowledgements This project is made possible through the generous support of [Anthropic](https://www.anthropic.com/), who provided free access to the `Claude-2.0` API.
提供机构:
taesiri
原始信息汇总

数据集概述

基本信息

  • 许可证: cdla-sharing-1.0
  • 任务类别:
    • 文本生成
    • 文本到文本生成
  • 语言:
    • 波斯语 (fa)
    • 英语 (en)
  • 标签:
    • 波斯语
    • 法尔西语
    • 英语到法尔西语
    • 法尔西语到英语
  • 大小类别: 100K<n<1M

数据集详情

  • 名称: Tiny Stories - Farsi
  • 描述: 该项目旨在将Tiny Stories数据集翻译成波斯语(法尔西语),目标是生产高质量的法尔西语数据集,保持与原始英语版本的等效性,并用于训练法尔西语语言模型。目前,项目已将27,000多条来自验证集的条目从英语翻译成法尔西语,使用Claude-2.0语言模型进行翻译。

相关研究

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作