wildwood77/bluesky-embeddings-daily
收藏Hugging Face2025-08-09 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/wildwood77/bluesky-embeddings-daily
下载链接
链接失效反馈官方服务:
资源简介:
Bluesky Embeddings Feed数据集包含了来自Bluesky社交网络的公开帖子的向量嵌入。这些嵌入是为了进行语义搜索、内容发现和语言模型实验而生成的。数据集包括帖子的URI、创建时间戳、文本内容以及一个384维的浮点向量,该向量代表了帖子的语义内容。数据以Apache Parquet格式存储,并且每天更新两次。这个数据集适用于语义搜索、主题聚类和真实世界社交内容的语言模型评估等任务。需要注意的是,该数据集仅包含公开帖子,并不包括任何私人用户信息。
The Bluesky Embeddings Feed dataset contains vector embeddings of public posts from the Bluesky Social network, generated for semantic search, discovery, and language model experimentation. The dataset includes fields such as the post URI, creation timestamp, text content, and a 384-dimensional float vector representing the semantic content of the post. The data is stored in Apache Parquet format and is updated twice daily. This dataset is suitable for tasks like semantic search, topic clustering, and language model evaluation on real-world social content. It is important to note that the dataset only includes public posts and does not contain any personal data.
提供机构:
wildwood77



