five

Geralt-Targaryen/tinystories

收藏
Hugging Face2025-01-05 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/Geralt-Targaryen/tinystories
下载链接
链接失效反馈
官方服务:
资源简介:
TinyStories数据集是从GPT-3.5和GPT-4生成的,经过清洗和去重,包含2,205,910个GPT-3.5生成的故事和2,733,747个GPT-4生成的故事。为了确保数据集的质量和适用性,该数据集还针对多个知名基准数据集进行了去污染处理,移除了与这些基准重合的内容。最终,数据集中有两篇文档因为重合度问题被移除。

The TinyStories dataset consists of stories generated from GPT-3.5 and GPT-4, after cleaning and deduplication, it includes 2,205,910 stories from GPT-3.5 and 2,733,747 from GPT-4. The dataset has been decontaminated with respect to multiple well-known NLP benchmark datasets based on n-gram overlap, ensuring its quality and applicability. Finally, two documents were removed from the dataset due to overlap issues.
提供机构:
Geralt-Targaryen
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作