chaiamy/reddit_dataset_197
收藏Hugging Face2025-03-25 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/chaiamy/reddit_dataset_197
下载链接
链接失效反馈官方服务:
资源简介:
Bittensor Subnet 13 Reddit数据集是Bittensor Subnet 13去中心化网络的一部分,包含了预处理后的Reddit数据。该数据集由网络矿工持续更新,为各种分析和机器学习任务提供实时Reddit内容流。数据集支持多种任务,如情感分析、主题建模、社区分析和内容分类。数据主要使用英语,但由于去中心化的创建方式,也可能是多语言的。数据集包含帖子或评论的主要内容、情感或主题类别、帖子或评论的类型、子版块名称、发布日期、编码后的用户名和URL等字段。该数据集持续更新,没有固定的分割,用户应根据需求和时间戳创建自己的数据分割。
The Bittensor Subnet 13 Reddit Dataset is a part of the Bittensor Subnet 13 decentralized network, containing preprocessed Reddit data. The dataset is continuously updated by network miners, providing a real-time stream of Reddit content for various analytical and machine learning tasks. It supports tasks like sentiment analysis, topic modeling, community analysis, and content categorization. The data is primarily in English but can be multilingual due to the decentralized manner of creation. The dataset includes fields such as the main content of the post or comment, sentiment or topic category, type of post or comment, subreddit name, datetime, encoded username, and encoded URLs. The dataset is continuously updated without fixed splits, and users are expected to create their own splits based on timestamps.
提供机构:
chaiamy



