SaladTechnologies/fiction-1b
收藏Hugging Face2025-09-17 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/SaladTechnologies/fiction-1b
下载链接
链接失效反馈官方服务:
资源简介:
Fiction 1B数据集包含了大约20,000部叙事小说的文本,这些文本从Project Gutenberg、AO3和Internet Archive三个来源收集而来。数据集经过处理后,移除了非叙事文本内容,如版权信息、元数据等。这个数据集适用于训练语言模型,特别是用于填充掩码训练和文本生成训练。
The Fiction 1B dataset contains the text of approximately 20,000 narrative fiction works sourced from Project Gutenberg, AO3, and Internet Archive. The dataset has been processed to remove non-narrative content such as license text and metadata. This dataset is intended for use in training language models, particularly for fill-mask training and text generation training.
提供机构:
SaladTechnologies



