recursal/OKReddit-ReleaseCandidate4
收藏Hugging Face2025-07-02 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/recursal/OKReddit-ReleaseCandidate4
下载链接
链接失效反馈官方服务:
资源简介:
OKReddit 是一个包含从 2005 年到 2023 年的 Reddit 帖子和评论的过滤集合,总大小约为 6.5 TiB(估计有 6 亿行 Reddit 提交)。这个数据集是为了研究或存档目的而准备的,主要包括英语,但也包含其他语言的帖子。数据集包含了许多高质量的子版块,并被用于各种自然语言处理任务,如文本分类、语言建模、情感分析和主题建模。
OKReddit is a filtered collection of Reddit submissions and comments from 2005 to 2023, totaling approximately 6.5 TiB (an estimated 600M rows of reddit submissions). This dataset is prepared for research or archival purposes and is mainly in English, with some posts in other languages. The dataset includes high-quality subreddits and is used for various natural language processing tasks such as text classification, language modeling, sentiment analysis, and topic modeling.
提供机构:
recursal



