Gabrui/multilingual_TinyStories
收藏Hugging Face2024-10-03 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Gabrui/multilingual_TinyStories
下载链接
链接失效反馈官方服务:
资源简介:
Multilingual TinyStories数据集包含多种语言的短故事,这些故事是由GPT-3.5和GPT-4生成的英文故事翻译而来。数据集适用于训练和评估小型语言模型(SLMs),支持多种语言,包括西班牙语、中文、德语、土耳其语、波斯语、韩语、阿拉伯语、越南语、希伯来语和印地语等。该数据集旨在帮助研究人员探索不同语言环境下的语言模型扩展规律、可解释性等现象。
The Multilingual TinyStories dataset contains translations of the original TinyStories dataset, which consists of synthetically generated short stories using a small vocabulary suitable for 3 to 4-year-olds. These stories were originally generated by GPT-3.5 and GPT-4. The multilingual versions have been translated into various languages, including Spanish, Chinese, German, Turkish, Farsi, Korean, Arabic, Vietnamese, Hebrew, and Hindi. This dataset is ideal for training and evaluating small language models (SLMs) in multiple languages. It allows researchers to explore scaling laws, interpretability, and other phenomena across different languages and smaller language models with fewer than 10 million parameters.
提供机构:
Gabrui



