Geralt-Targaryen/CC-Stories
收藏Hugging Face2025-01-15 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/Geralt-Targaryen/CC-Stories
下载链接
链接失效反馈官方服务:
资源简介:
CC-Stories是一个清洗和去重后的故事数据集,包含679,689个样本。数据集经过内部去重和与OpenWebText2的交叉去重,以及基于n-gram重叠的脱毒处理,确保了数据的质量和多样性。
CC-Stories is a cleaned and deduplicated story dataset containing 679,689 samples. The dataset has been internally deduplicated and cross-deduplicated with OpenWebText2, and decontaminated based on n-gram overlap to ensure data quality and diversity.
提供机构:
Geralt-Targaryen



