Geralt-Targaryen/tinystories
收藏Hugging Face2025-01-05 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/Geralt-Targaryen/tinystories
下载链接
链接失效反馈官方服务:
资源简介:
TinyStories数据集是从GPT-3.5和GPT-4生成的,经过清洗和去重,包含2,205,910个GPT-3.5生成的故事和2,733,747个GPT-4生成的故事。为了确保数据集的质量和适用性,该数据集还针对多个知名基准数据集进行了去污染处理,移除了与这些基准重合的内容。最终,数据集中有两篇文档因为重合度问题被移除。
The TinyStories dataset consists of stories generated from GPT-3.5 and GPT-4, after cleaning and deduplication, it includes 2,205,910 stories from GPT-3.5 and 2,733,747 from GPT-4. The dataset has been decontaminated with respect to multiple well-known NLP benchmark datasets based on n-gram overlap, ensuring its quality and applicability. Finally, two documents were removed from the dataset due to overlap issues.
提供机构:
Geralt-Targaryen



