five

gsjcm/reddit_dataset_28

收藏
Hugging Face2025-10-12 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/gsjcm/reddit_dataset_28
下载链接
链接失效反馈
官方服务:
资源简介:
这是一个名为Bittensor Subnet 13 Reddit Dataset的数据集,包含Reddit上的预处理数据,由Bittensor Subnet 13网络中的矿工不断更新,用于各种分析和机器学习任务。数据集支持多种任务,如情感分析、主题建模、社区分析和内容分类。数据集主要以英语为主,但也可能是多语言的。数据集的结构包括每个Reddit帖子或评论的内容、标签、数据类型、社区名称、日期时间、用户名编码和URL编码。数据集不断更新,没有固定的分割。数据来源于Reddit上的公开帖子和评论,遵守Reddit的条款和服务以及API使用指南。所有用户名和URL都进行了编码以保护用户隐私。数据集可能存在数据质量、噪声、垃圾邮件、无关内容、时间偏差和代表性限制等问题。数据集发布在MIT许可下,并要求遵守Reddit的使用条款。如果使用数据集进行研究,需要引用数据集的来源。

This is the Bittensor Subnet 13 Reddit Dataset, containing preprocessed data from Reddit, continuously updated by miners in the Bittensor Subnet 13 network for various analytical and machine learning tasks. The dataset supports tasks such as sentiment analysis, topic modeling, community analysis, and content categorization. It is primarily in English but can also be multilingual. The dataset structure includes content, labels, data types, community names, dates, username encodings, and URL encodings. The dataset is continuously updated and does not have fixed splits. The data is sourced from public posts and comments on Reddit, adhering to the platforms terms of service and API usage guidelines. All usernames and URLs are encoded to protect user privacy. The dataset may have issues with data quality, noise, spam, irrelevant content, temporal biases, and representativeness limitations. It is released under the MIT license and requires compliance with Reddits terms of use. Citation is required if the dataset is used in research.
提供机构:
gsjcm
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作