natong19/wildchat-1m-filtered
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/natong19/wildchat-1m-filtered
下载链接
链接失效反馈官方服务:
资源简介:
natong19/wildchat-1m-filtered数据集是allenai/WildChat-1M数据集的过滤版本,包含了一百万条与ChatGPT的真实对话,既有非毒性对话也有毒性对话。数据集经过了一系列的清洗过程,包括过滤REDACTED条目、格式验证、去除纯重复用户轮次、MinHash去重、语义去重以及过滤重复条目等步骤,最终保留了443500个样本。
Filtered version of allenai/WildChat-1M, a collection of one million real-world conversations with ChatGPT. Contains both non-toxic and toxic conversations. The dataset underwent a series of cleaning processes including filtering REDACTED entries, format validation, removing pure duplicate user turns, MinHash deduplication, semantic deduplication, and filtering repetitive entries, resulting in 443500 samples.
提供机构:
natong19



