RSS3-Network/high_quality_open_web_content
收藏Hugging Face2024-10-09 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/RSS3-Network/high_quality_open_web_content
下载链接
链接失效反馈官方服务:
资源简介:
该数据集由RSS3 Network整理,包含来自多个去中心化平台(如Farcaster和Lens)的大量内容。数据集经过RSS3 Nodes的结构化和索引处理,便于访问和分析。RSS3生态系统项目已利用该数据集构建各种应用和服务,包括微调大型语言模型、训练推荐系统等。数据集包含以下字段:作者在对应平台上的句柄(handle)、帖子文本内容(body)、与帖子相关的媒体对象列表(media,包含媒体地址和MIME类型)、作者在对应平台上的个人资料ID(profile_id)、帖子在对应平台上的发布ID(publication_id)以及帖子的时间戳(timestamp)。数据集采用CC0 1.0许可证,允许用户自由使用和分发。
This dataset is curated by the RSS3 Network and contains a large collection of content from various decentralized platforms such as Farcaster and Lens. The dataset has been structured and indexed by RSS3 Nodes and is provided in a structured format for easy access and analysis. The features of the dataset include the authors handle, post content, associated media objects (including media address and MIME type), authors profile ID, posts publication ID, and post timestamp. The dataset is divided into multiple batches, each containing a certain number of examples and bytes. The dataset is licensed under CC0 1.0, allowing users to distribute, remix, adapt, and build upon the material in any medium or format, even for commercial purposes.
提供机构:
RSS3-Network



