ffuuugor/tinystories_spanish
收藏Hugging Face2025-11-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/ffuuugor/tinystories_spanish
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
- name: text_es
dtype: string
splits:
- name: train
num_bytes: 3951766524
num_examples: 2119719
- name: validation
num_bytes: 39952442
num_examples: 21990
download_size: 2068469787
dataset_size: 3991718966
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
license: cdla-sharing-1.0
language:
- en
- es
pretty_name: Bilingual TinyStories (English/Spanish)
size_categories:
- 1M<n<10M
---
# TinyStories Bilingual (English-Spanish)
## Dataset Description
This dataset is a bilingual version of [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories), containing the original English stories alongside Spanish translations of each story.
### Changes from Original Dataset
- **Added**: Spanish translations for all stories
- **Translation Method**: Automated translation using `claude-3-haiku-20240307`
- **Format**: Each example now contains both English and Spanish versions
## Dataset Structure
```python
{
"text": "One day, a little girl named Lily found a needle in her room.",
"text_es": "Un día, una niña pequeña llamada Lily encontró una aguja en su habitación."
}
```
## Translation Methodology
Spanish translations were generated using the following prompt with `claude-3-haiku-20240307`:
```
Translate the following short story into Spanish. Keep the same tone, style, and meaning.
The translation should be natural and fluent Spanish, appropriate for children.
English story:
{story}
Spanish translation:
```
### Attribution
This dataset is based on and derived from:
- **Original Dataset**: [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
- **Original Paper**: ["TinyStories: How Small Can Language Models Be and Still Speak Coherent English?"](https://arxiv.org/abs/2305.07759)
dataset_info:
数据集信息:
features:
- 字段名: text
数据类型: 字符串
- 字段名: text_es
数据类型: 字符串
splits:
- 拆分名称: train(训练集)
字节数: 3951766524
样本数量: 2119719
- 拆分名称: validation(验证集)
字节数: 39952442
样本数量: 21990
download_size: 2068469787
dataset_size: 3991718966
configs:
- 配置名称: default(默认配置)
数据文件:
- 拆分: train(训练集)
路径: data/train-*
- 拆分: validation(验证集)
路径: data/validation-*
license: cdla-sharing-1.0
language:
- 英语(en)
- 西班牙语(es)
pretty_name: 双语 TinyStories(英语/西班牙语)
size_categories:
- 100万<样本数<1000万
# 双语TinyStories(英语-西班牙语)
## 数据集说明
本数据集是[roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)的双语版本,包含原始英文故事及每篇故事的西班牙语译文。
### 与原始数据集的差异
- **新增**: 为所有故事添加西班牙语译文
- **翻译方法**: 使用`claude-3-haiku-20240307`模型进行自动翻译
- **格式调整**: 每条样本现已同时包含英文与西班牙语版本
## 数据集结构
python
{
"text": "One day, a little girl named Lily found a needle in her room.",
"text_es": "Un día, una niña pequeña llamada Lily encontró una aguja en su habitación."
}
## 翻译方法
西班牙语译文通过`claude-3-haiku-20240307`模型使用如下提示词生成:
Translate the following short story into Spanish. Keep the same tone, style, and meaning.
The translation should be natural and fluent Spanish, appropriate for children.
English story:
{story}
Spanish translation:
### 归属声明
本数据集基于并衍生自以下内容:
- **原始数据集**: [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
- **原始论文**: 《TinyStories: How Small Can Language Models Be and Still Speak Coherent English?》(https://arxiv.org/abs/2305.07759)
提供机构:
ffuuugor



