Roronotalt/bluesky
收藏Hugging Face2024-12-23 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Roronotalt/bluesky
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含从Bluesky Social的firehose API收集的500万条公开帖子,旨在用于机器学习研究和社交媒体数据的实验。数据集扩展了Alpindales原始的200万条帖子数据集,并包含了作者句柄、图片URL及帖子中的元数据。数据集的结构包括类型、文本、创建时间、作者等特征,并提供了下载和加载方法。数据集的用途包括研究社交媒体趋势、内容审核和对话结构等。数据集未经过滤,但进行了去重处理,并按作者列排序。
This dataset contains 5 million public posts collected from Bluesky Socials firehose API, primarily for machine learning research and experimentation with social media data. The dataset includes features such as post type, text, creation date, author information, and embedded images with metadata. It is intended for studying social media trends, content moderation, and conversation structures. Curated by Roro and licensed under MIT, the dataset offers a train split with a significant number of examples and bytes. The README also provides instructions for loading and using the dataset, including converting it to a pandas dataframe and handling image data.
提供机构:
Roronotalt



