dark-xet/test-public-dataset
收藏Hugging Face2025-03-26 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/dark-xet/test-public-dataset
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含由GPT-3.5和GPT-4生成的简短故事的合成数据集,仅使用了一个小型词汇库。数据集在论文https://arxiv.org/abs/2305.07759中有所描述。论文中提及的模型是基于TinyStories-train.txt训练的,可以在Huggingface上的roneneldan/TinyStories-1M/3M/8M/28M/33M/1Layer-21M找到。附加资源包括一个包含所有故事及其元数据和故事提示的压缩文件,以及一个基于GPT-4生成的数据集新版本。
This dataset consists of synthetically generated short stories by GPT-3.5 and GPT-4 using a small vocabulary. It is described in the paper https://arxiv.org/abs/2305.07759. The models mentioned in the paper were trained on TinyStories-train.txt (the file tinystories-valid.txt can be used for validation loss) and can be found on Huggingface at roneneldan/TinyStories-1M/3M/8M/28M/33M/1Layer-21M. Additional resources include a compressed file containing all stories with metadata and the prompts used to create each story, as well as a new version of the dataset based on GPT-4 generations.
提供机构:
dark-xet



