unclegravity/puertorico-reddit
收藏Hugging Face2023-12-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/unclegravity/puertorico-reddit
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains almost the entire history of the /r/puertorico subreddit (2012-2021), filtered into comment/reply pairs. Childness comments are omitted, and so is the OP content.
Format: ChatML
Disclaimer:
Content has not been filtered so if you look for things that you don't like, you are sure to find them.
TODO:
- add OP to dataset as "top-level" comment.
- create simple completion dataset that includes everything
- clean up dataset. There's still a bunch of bot responses, and removed/deleted comments.
提供机构:
unclegravity
原始信息汇总
数据集概述
数据内容
- 数据集包含/r/puertorico子论坛从2012年到2021年的几乎全部历史数据,经过筛选形成评论/回复对。
- 不包括子评论和原始帖子内容。
数据格式
- 数据格式为ChatML。
注意事项
- 数据内容未经筛选,可能包含用户不希望看到的内容。
未来计划
- 计划将原始帖子内容添加为“顶级”评论。
- 计划创建一个包含所有内容的简单完成数据集。
- 计划清理数据集,移除机器人回复和已删除/移除的评论。



