metapoiesis/171k_bluesky_threads
收藏Hugging Face2024-12-11 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/metapoiesis/171k_bluesky_threads
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: thread
list:
- name: author
dtype: string
- name: created_at
dtype: string
- name: text
dtype: string
- name: uri
dtype: string
- name: turns
dtype: int64
splits:
- name: train
num_bytes: 192510751
num_examples: 171146
download_size: 108775087
dataset_size: 192510751
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
outdated! check out [metalure/733k_bluesky_threads](https://hf.co/datasets/metalure/733k_bluesky_threads)!
all threads present in [alpindale/two-million-bluesky-posts](https://hf.co/datasets/alpindale/two-million-bluesky-posts)
turns mean 4.537996798055461, median 3
example datapoint with one reply:
```
{'thread': [
{'author': 'did:plc:fq3awsbogpqzozhbt7sul264',
'created_at': '2024-11-27T12:15:56.566Z',
'text': 'i just listened to full albums from imogen heap, ladytron, linkin park, weezer, and system of a down, as opposed to sleeping ^_^',
'uri': 'at://did:plc:fq3awsbogpqzozhbt7sul264/app.bsky.feed.post/3lbwjqsnyjk2j'},
{'author': 'did:plc:3guei24uepx42m5g7dktzo2n',
'created_at': '2024-11-27T12:38:07.010Z',
'text': 'sleeping is so overrated.',
'uri': 'at://did:plc:3guei24uepx42m5g7dktzo2n/app.bsky.feed.post/3lbwkyhhvbc2j'}]}
```
### 数据集信息
#### 特征字段
1. **thread**:列表类型,包含以下子字段:
- `author`:字符串类型,用于标识发帖作者
- `created_at`:字符串类型,记录内容的创建时间
- `text`:字符串类型,存储文本内容
- `uri`:字符串类型,对应内容的统一资源标识符
2. **turns**:64位整型特征,用于记录会话轮次数量
#### 数据集划分
- 训练集(train):字节大小为192510751,共包含171146条样本
- 下载总大小:108775087字节
- 数据集总存储大小:192510751字节
#### 配置项
- 默认配置(default):对应训练集划分的数据文件路径为 `data/train-*`
---
本数据集已过时,请访问数据集 [metalure/733k_bluesky_threads](https://hf.co/datasets/metalure/733k_bluesky_threads) 获取最新版本!
本数据集包含的全部会话帖均源自数据集 [alpindale/two-million-bluesky-posts](https://hf.co/datasets/alpindale/two-million-bluesky-posts)。
会话轮次的平均值为4.537996798055461,中位数为3。
带有一条回复的示例数据点如下:
{'thread': [
{'author': 'did:plc:fq3awsbogpqzozhbt7sul264',
'created_at': '2024-11-27T12:15:56.566Z',
'text': 'i just listened to full albums from imogen heap, ladytron, linkin park, weezer, and system of a down, as opposed to sleeping ^_^',
'uri': 'at://did:plc:fq3awsbogpqzozhbt7sul264/app.bsky.feed.post/3lbwjqsnyjk2j'},
{'author': 'did:plc:3guei24uepx42m5g7dktzo2n',
'created_at': '2024-11-27T12:38:07.010Z',
'text': 'sleeping is so overrated.',
'uri': 'at://did:plc:3guei24uepx42m5g7dktzo2n/app.bsky.feed.post/3lbwkyhhvbc2j'}]}
提供机构:
metapoiesis



