zengsdfew/reddit_dataset_44
收藏Hugging Face2025-03-06 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/zengsdfew/reddit_dataset_44
下载链接
链接失效反馈官方服务:
资源简介:
Bittensor Subnet 13 Reddit数据集是Bittensor Subnet 13去中心化网络的一部分,包含预处理后的Reddit帖子和评论数据。该数据集持续更新,提供实时流数据,适用于情感分析、主题建模、社区分析、内容分类等多种机器学习任务。数据集主要由英语构成,但也可能包含多种语言。数据包括文本内容、标签、数据类型、社区名称、时间戳、编码用户名和URL等字段,用户需根据需求自行分割数据。数据来源于Reddit的公共内容,所有用户名和URL均经过编码处理以保护隐私。
The Bittensor Subnet 13 Reddit Dataset is a part of the Bittensor Subnet 13 decentralized network, containing preprocessed Reddit posts and comments. The dataset is continuously updated, providing a real-time stream of data suitable for various machine learning tasks such as sentiment analysis, topic modeling, community analysis, and content categorization. The dataset is primarily in English but may also include multilingual content. It includes fields like text content, labels, data type, community name, timestamp, encoded usernames, and URLs, and users need to create their own splits based on their requirements. The data is sourced from public Reddit content, with all usernames and URLs encoded to protect privacy.
提供机构:
zengsdfew



