five

ffuuugor/tinystories_spanish

收藏
Hugging Face2025-11-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/ffuuugor/tinystories_spanish
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: text dtype: string - name: text_es dtype: string splits: - name: train num_bytes: 3951766524 num_examples: 2119719 - name: validation num_bytes: 39952442 num_examples: 21990 download_size: 2068469787 dataset_size: 3991718966 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* license: cdla-sharing-1.0 language: - en - es pretty_name: Bilingual TinyStories (English/Spanish) size_categories: - 1M<n<10M --- # TinyStories Bilingual (English-Spanish) ## Dataset Description This dataset is a bilingual version of [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories), containing the original English stories alongside Spanish translations of each story. ### Changes from Original Dataset - **Added**: Spanish translations for all stories - **Translation Method**: Automated translation using `claude-3-haiku-20240307` - **Format**: Each example now contains both English and Spanish versions ## Dataset Structure ```python { "text": "One day, a little girl named Lily found a needle in her room.", "text_es": "Un día, una niña pequeña llamada Lily encontró una aguja en su habitación." } ``` ## Translation Methodology Spanish translations were generated using the following prompt with `claude-3-haiku-20240307`: ``` Translate the following short story into Spanish. Keep the same tone, style, and meaning. The translation should be natural and fluent Spanish, appropriate for children. English story: {story} Spanish translation: ``` ### Attribution This dataset is based on and derived from: - **Original Dataset**: [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) - **Original Paper**: ["TinyStories: How Small Can Language Models Be and Still Speak Coherent English?"](https://arxiv.org/abs/2305.07759)

dataset_info: 数据集信息: features: - 字段名: text 数据类型: 字符串 - 字段名: text_es 数据类型: 字符串 splits: - 拆分名称: train(训练集) 字节数: 3951766524 样本数量: 2119719 - 拆分名称: validation(验证集) 字节数: 39952442 样本数量: 21990 download_size: 2068469787 dataset_size: 3991718966 configs: - 配置名称: default(默认配置) 数据文件: - 拆分: train(训练集) 路径: data/train-* - 拆分: validation(验证集) 路径: data/validation-* license: cdla-sharing-1.0 language: - 英语(en) - 西班牙语(es) pretty_name: 双语 TinyStories(英语/西班牙语) size_categories: - 100万<样本数<1000万 # 双语TinyStories(英语-西班牙语) ## 数据集说明 本数据集是[roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)的双语版本,包含原始英文故事及每篇故事的西班牙语译文。 ### 与原始数据集的差异 - **新增**: 为所有故事添加西班牙语译文 - **翻译方法**: 使用`claude-3-haiku-20240307`模型进行自动翻译 - **格式调整**: 每条样本现已同时包含英文与西班牙语版本 ## 数据集结构 python { "text": "One day, a little girl named Lily found a needle in her room.", "text_es": "Un día, una niña pequeña llamada Lily encontró una aguja en su habitación." } ## 翻译方法 西班牙语译文通过`claude-3-haiku-20240307`模型使用如下提示词生成: Translate the following short story into Spanish. Keep the same tone, style, and meaning. The translation should be natural and fluent Spanish, appropriate for children. English story: {story} Spanish translation: ### 归属声明 本数据集基于并衍生自以下内容: - **原始数据集**: [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) - **原始论文**: 《TinyStories: How Small Can Language Models Be and Still Speak Coherent English?》(https://arxiv.org/abs/2305.07759)
提供机构:
ffuuugor
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作