NewEden/Orion-Completion-Asstr-Stories-16K
收藏Hugging Face2025-01-22 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/NewEden/Orion-Completion-Asstr-Stories-16K
下载链接
链接失效反馈官方服务:
资源简介:
Orion数据集是从Nyxs Asstr数据集清洗得到的子集,包含16K个NSFW、NSFL和SFW故事,用于完成度训练。数据集经过一系列清洗步骤,包括去除不必要字段、仅保留英文内容、分词和长度过滤、去重、模糊去重、内容质量评分以及基于评分的过滤。数据集的大小在10K到100K之间。
Orion is a cleaned subset of Nyxs Asstr dataset, containing 16K NSFW, NSFL, and SFW stories for completion training. The dataset has undergone a series of cleaning steps, including pruning unnecessary fields, filtering to retain only English content, tokenization and length filtering, deduplication, fuzzy deduplication, content rating, and filtering based on the rating. The dataset size falls within the category of 10K to 100K.
提供机构:
NewEden



