zzzztom/moltbook-forum-data
收藏Hugging Face2026-04-02 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/zzzztom/moltbook-forum-data
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-classification
- text-generation
language:
- en
tags:
- moltbook
- llm-agents
- social-norms
- conversational-repair
- online-communities
- agent-forum
- social-alignment
size_categories:
- 1M<n<10M
pretty_name: Moltbook Forum Data
---
# Moltbook Forum Data
A structured snapshot of [Moltbook](https://moltbook.com), a live deployed forum where LLM agents interact with one another. This dataset contains **7.9M posts** and **3.6M comments** (including 5.7% with threaded reply structure) across **5,084 communities (submolts)**, collected between January 28 and February 17, 2026.
## Paper
**Do Agents Repair When Challenged — or Just Reply? Challenge, Repair, and Public Correction in a Deployed Agent Forum**
Luyang Zhang, Yi-Yun Chu, Jialu Wang, Beibei Li, Ramayya Krishnan
[arXiv:2604.00518](https://arxiv.org/abs/2604.00518)
## Dataset Description
| File | Rows | Size | Description |
|------|------|------|-------------|
| `moltbook_comment.csv` | 3,552,190 | 1.5 GB | All comments with reply structure |
| `moltbook_post.csv` | ~7.9M | 608 MB | All posts (top-level submissions) |
### Comment Schema
| Column | Type | Description |
|--------|------|-------------|
| `comment_id` | string | Unique comment identifier (UUID) |
| `post_id` | string | Parent post identifier |
| `parent_id` | string | Parent comment ID (empty if direct reply to post) |
| `comment_content` | string | Full text of the comment |
| `comment_created_at` | string | Timestamp (UTC) |
| `comment_author_id` | string | Author identifier (UUID) |
| `comment_author_name` | string | Author display name |
| `comment_upvotes` | float | Upvote count |
| `comment_downvotes` | float | Downvote count |
| `comment_depth` | float | Nesting depth (0 = direct reply to post) |
### Key Platform Properties
- Only ~5.7% of comments have a non-empty `parent_id`, reflecting Moltbook's flat threading structure
- Comments with empty `parent_id` are direct replies to the original post (`comment_depth` = 0)
- This is a platform-level property, not a data collection artifact
## Usage
```python
from datasets import load_dataset
# Load comments
comments = load_dataset("zzzztom/moltbook-forum-data", data_files="moltbook_comment.csv", split="train")
# Load posts
posts = load_dataset("zzzztom/moltbook-forum-data", data_files="moltbook_post.csv", split="train")
# Filter to the 5 main communities used in the paper
main_communities = ["philosophy", "ponderings", "todayilearned", "ai", "builds"]
# Reconstruct reply chains
nested = comments.filter(lambda x: x["parent_id"] != "")
```
## Reddit Comparison Data
The paper compares Moltbook with matched Reddit communities. Reddit data is **not included** in this release due to redistribution restrictions. The Reddit data used in the paper is available from:
- [HuggingFaceGECLM/REDDIT_comments](https://huggingface.co/datasets/HuggingFaceGECLM/REDDIT_comments) (Pushshift-sourced, 50 subreddits, 2006–2023)
We use a pre-LLM window (2018–2021) in primary analyses.
## Citation
If you use this dataset, please cite:
```bibtex
@article{zhang2026agents,
title={Do Agents Repair When Challenged---or Just Reply? Challenge, Repair, and Public Correction in a Deployed Agent Forum},
author={Zhang, Luyang and Chu, Yi-Yun and Wang, Jialu and Li, Beibei and Krishnan, Ramayya},
journal={arXiv preprint arXiv:2604.00518},
year={2026}
}
```
## License
This dataset is released under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/).
---
许可证:CC-BY-4.0
任务类别:
- 文本分类
- 文本生成
语言:
- 英语
标签:
- Moltbook
- LLM智能体(LLM-agents)
- 社会规范(social-norms)
- 会话修复(conversational-repair)
- 在线社区(online-communities)
- 智能体论坛(agent-forum)
- 社会对齐(social-alignment)
规模类别:
- 1M<n<10M
美观名称:Moltbook论坛数据集
---
# Moltbook论坛数据集
本数据集是[Moltbook](https://moltbook.com)的结构化快照,该平台是一个已上线运行的论坛,供大语言模型(LLM)智能体之间进行交互。本数据集包含2026年1月28日至2月17日期间采集的790万条帖子与360万条评论(其中5.7%采用线程式回复结构),涵盖5084个社区(submolts)。
## 相关论文
**《智能体遭遇质疑时会修复对话,还是仅作回复?部署式智能体论坛中的质疑、修复与公开修正》**
作者:张路遥、朱亦云、王佳璐、李贝贝、Ramayya Krishnan
[arXiv:2604.00518](https://arxiv.org/abs/2604.00518)
## 数据集说明
| 文件 | 行数 | 大小 | 描述 |
|------|------|------|-------------|
| `moltbook_comment.csv` | 3,552,190 | 1.5 GB | 所有带回复结构的评论 |
| `moltbook_post.csv` | 约790万 | 608 MB | 所有帖子(顶层提交内容) |
### 评论数据模式
| 字段名 | 数据类型 | 描述 |
|--------|------|-------------|
| `comment_id` | 字符串 | 评论唯一标识符(通用唯一识别码UUID) |
| `post_id` | 字符串 | 所属帖子的标识符 |
| `parent_id` | 字符串 | 父评论ID(若直接回复帖子则为空) |
| `comment_content` | 字符串 | 评论完整文本 |
| `comment_created_at` | 字符串 | 时间戳(协调世界时UTC) |
| `comment_author_id` | 字符串 | 作者标识符(UUID) |
| `comment_author_name` | 字符串 | 作者显示名称 |
| `comment_upvotes` | 浮点型 | 点赞数 |
| `comment_downvotes` | 浮点型 | 点踩数 |
| `comment_depth` | 浮点型 | 嵌套深度(0表示直接回复帖子) |
### 平台核心特性
- 仅约5.7%的评论拥有非空的`parent_id`字段,这体现了Moltbook的扁平化线程结构。
- `parent_id`为空的评论为直接回复原帖的内容(`comment_depth`=0)。
- 该特性属于平台原生设计,并非数据采集过程中的人为产物。
## 使用方法
python
from datasets import load_dataset
# 加载评论
comments = load_dataset("zzzztom/moltbook-forum-data", data_files="moltbook_comment.csv", split="train")
# 加载帖子
posts = load_dataset("zzzztom/moltbook-forum-data", data_files="moltbook_post.csv", split="train")
# 筛选出论文中使用的5个核心社区
main_communities = ["philosophy", "ponderings", "todayilearned", "ai", "builds"]
# 重构回复链
nested = comments.filter(lambda x: x["parent_id"] != "")
## Reddit对比数据集
本论文将Moltbook与匹配的Reddit社区进行了对比。由于再分发限制,本次发布未包含Reddit相关数据。论文中使用的Reddit数据可从以下渠道获取:
- [HuggingFaceGECLM/REDDIT_comments](https://huggingface.co/datasets/HuggingFaceGECLM/REDDIT_comments)(数据来源于Pushshift,涵盖50个Reddit子版块,时间范围为2006年至2023年)
本研究在主要分析中采用了LLM普及前的时间窗口(2018年至2021年)。
## 引用格式
若您使用本数据集,请引用以下文献:
bibtex
@article{zhang2026agents,
title={Do Agents Repair When Challenged---or Just Reply? Challenge, Repair, and Public Correction in a Deployed Agent Forum},
author={Zhang, Luyang and Chu, Yi-Yun and Wang, Jialu and Li, Beibei and Krishnan, Ramayya},
journal={arXiv preprint arXiv:2604.00518},
year={2026}
}
## 许可证
本数据集采用[CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)许可协议发布。
提供机构:
zzzztom



