coldmind/reddit_dataset_172
收藏Hugging Face2025-02-03 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/coldmind/reddit_dataset_172
下载链接
链接失效反馈官方服务:
资源简介:
Bittensor Subnet 13 Reddit数据集是Bittensor Subnet 13去中心化网络的一部分,包含经过预处理的Reddit数据。这些数据实时更新,由网络矿工提供。数据集主要用于各种分析和机器学习任务,支持的任务包括情感分析、主题建模、社区分析和内容分类等。数据集主要是英文的,但也可能是多语言的。每个数据实例代表一个Reddit帖子或评论,包含文本内容、情感或主题标签、数据类型、社区名称、发布日期、用户名编码和URL编码等字段。数据集不断更新,用户需要根据时间戳自行创建数据分割。数据来源于Reddit的公共帖子和评论,所有用户名和URL都经过编码处理以保护隐私。
The Bittensor Subnet 13 Reddit Dataset is part of the Bittensor Subnet 13 decentralized network, containing preprocessed Reddit data. These data are updated in real-time by network miners and are used for various analytical and machine learning tasks. Supported tasks include sentiment analysis, topic modeling, community analysis, and content categorization. The dataset is primarily in English but may also be multilingual. Each data instance represents a single Reddit post or comment, including fields for text content, sentiment or topic label, data type, community name, posting date, encoded username, and encoded URL. The dataset is continuously updated, and users need to create their own data splits based on timestamps. The data source is public posts and comments on Reddit, with all usernames and URLs encoded to protect privacy.
提供机构:
coldmind



