wolfghost/reddit_dataset_107
收藏Hugging Face2025-02-11 更新2025-08-30 收录
下载链接:
https://hf-mirror.com/datasets/wolfghost/reddit_dataset_107
下载链接
链接失效反馈官方服务:
资源简介:
Bittensor Subnet 13 Reddit数据集是一个持续更新的去中心化网络数据集,包含预处理后的Reddit帖子或评论数据。数据集适用于多种分析任务,如情感分析、主题建模、社区分析、内容分类等。数据集主要是英文,但可能包含多语言内容。每个数据实例包括文本内容、标签、数据类型、社区名称、发布日期、编码后的用户名和URL等字段。数据集不断更新,没有固定的数据划分,用户需根据时间戳自行创建数据划分。数据来源于Reddit的公共帖子,遵循平台的服务条款和API使用指南。所有用户名和URL都进行了编码处理以保护隐私。数据集使用时需要注意潜在的偏见和局限性。
The Bittensor Subnet 13 Reddit dataset is a continuously updated decentralized network dataset containing preprocessed Reddit posts or comments. It is suitable for various analytical tasks such as sentiment analysis, topic modeling, community analysis, content categorization, etc. The dataset is primarily in English but may include multilingual content. Each data instance includes fields such as text content, label, data type, community name, posting date, encoded username, and URL. The dataset is continuously updated without fixed data splits, and users need to create their own splits based on timestamps. The data is sourced from public posts on Reddit, adhering to the platforms terms of service and API usage guidelines. All usernames and URLs are encoded to protect privacy. Users of the dataset should be aware of potential biases and limitations.
提供机构:
wolfghost



