Roronotalt/bluesky-ten-million
收藏Hugging Face2024-12-01 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Roronotalt/bluesky-ten-million
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含从Bluesky Social的firehose API收集的500万条公开帖子,旨在用于机器学习研究和社交媒体数据的实验。此数据集受到Alpindales原始200万条帖子数据集的启发,并在其基础上扩展了更多数据。Alpins的数据集未包含作者句柄或帖子中的图片URL及元数据,而这些图片及其标题可能对训练非常有价值,因此已被收集。这是即将到来的测试用的小版本数据集,用于格式化/较小项目。该数据集由我本人创建,与Bluesky或任何潜在雇主无关。
This dataset contains 5 million public posts collected from Bluesky Socials firehose API, intended for machine learning research and experimentation with social media data. The dataset includes multiple features such as type, text, created_at, author, and more. The uses of the dataset include studying social media trends, content moderation, and conversation structures. Detailed instructions for downloading and using the dataset are also provided.
提供机构:
Roronotalt



