five

zzzztom/moltbook-forum-data

收藏
Hugging Face2026-04-02 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/zzzztom/moltbook-forum-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - text-classification - text-generation language: - en tags: - moltbook - llm-agents - social-norms - conversational-repair - online-communities - agent-forum - social-alignment size_categories: - 1M<n<10M pretty_name: Moltbook Forum Data --- # Moltbook Forum Data A structured snapshot of [Moltbook](https://moltbook.com), a live deployed forum where LLM agents interact with one another. This dataset contains **7.9M posts** and **3.6M comments** (including 5.7% with threaded reply structure) across **5,084 communities (submolts)**, collected between January 28 and February 17, 2026. ## Paper **Do Agents Repair When Challenged — or Just Reply? Challenge, Repair, and Public Correction in a Deployed Agent Forum** Luyang Zhang, Yi-Yun Chu, Jialu Wang, Beibei Li, Ramayya Krishnan [arXiv:2604.00518](https://arxiv.org/abs/2604.00518) ## Dataset Description | File | Rows | Size | Description | |------|------|------|-------------| | `moltbook_comment.csv` | 3,552,190 | 1.5 GB | All comments with reply structure | | `moltbook_post.csv` | ~7.9M | 608 MB | All posts (top-level submissions) | ### Comment Schema | Column | Type | Description | |--------|------|-------------| | `comment_id` | string | Unique comment identifier (UUID) | | `post_id` | string | Parent post identifier | | `parent_id` | string | Parent comment ID (empty if direct reply to post) | | `comment_content` | string | Full text of the comment | | `comment_created_at` | string | Timestamp (UTC) | | `comment_author_id` | string | Author identifier (UUID) | | `comment_author_name` | string | Author display name | | `comment_upvotes` | float | Upvote count | | `comment_downvotes` | float | Downvote count | | `comment_depth` | float | Nesting depth (0 = direct reply to post) | ### Key Platform Properties - Only ~5.7% of comments have a non-empty `parent_id`, reflecting Moltbook's flat threading structure - Comments with empty `parent_id` are direct replies to the original post (`comment_depth` = 0) - This is a platform-level property, not a data collection artifact ## Usage ```python from datasets import load_dataset # Load comments comments = load_dataset("zzzztom/moltbook-forum-data", data_files="moltbook_comment.csv", split="train") # Load posts posts = load_dataset("zzzztom/moltbook-forum-data", data_files="moltbook_post.csv", split="train") # Filter to the 5 main communities used in the paper main_communities = ["philosophy", "ponderings", "todayilearned", "ai", "builds"] # Reconstruct reply chains nested = comments.filter(lambda x: x["parent_id"] != "") ``` ## Reddit Comparison Data The paper compares Moltbook with matched Reddit communities. Reddit data is **not included** in this release due to redistribution restrictions. The Reddit data used in the paper is available from: - [HuggingFaceGECLM/REDDIT_comments](https://huggingface.co/datasets/HuggingFaceGECLM/REDDIT_comments) (Pushshift-sourced, 50 subreddits, 2006–2023) We use a pre-LLM window (2018–2021) in primary analyses. ## Citation If you use this dataset, please cite: ```bibtex @article{zhang2026agents, title={Do Agents Repair When Challenged---or Just Reply? Challenge, Repair, and Public Correction in a Deployed Agent Forum}, author={Zhang, Luyang and Chu, Yi-Yun and Wang, Jialu and Li, Beibei and Krishnan, Ramayya}, journal={arXiv preprint arXiv:2604.00518}, year={2026} } ``` ## License This dataset is released under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/).

--- 许可证:CC-BY-4.0 任务类别: - 文本分类 - 文本生成 语言: - 英语 标签: - Moltbook - LLM智能体(LLM-agents) - 社会规范(social-norms) - 会话修复(conversational-repair) - 在线社区(online-communities) - 智能体论坛(agent-forum) - 社会对齐(social-alignment) 规模类别: - 1M<n<10M 美观名称:Moltbook论坛数据集 --- # Moltbook论坛数据集 本数据集是[Moltbook](https://moltbook.com)的结构化快照,该平台是一个已上线运行的论坛,供大语言模型(LLM)智能体之间进行交互。本数据集包含2026年1月28日至2月17日期间采集的790万条帖子与360万条评论(其中5.7%采用线程式回复结构),涵盖5084个社区(submolts)。 ## 相关论文 **《智能体遭遇质疑时会修复对话,还是仅作回复?部署式智能体论坛中的质疑、修复与公开修正》** 作者:张路遥、朱亦云、王佳璐、李贝贝、Ramayya Krishnan [arXiv:2604.00518](https://arxiv.org/abs/2604.00518) ## 数据集说明 | 文件 | 行数 | 大小 | 描述 | |------|------|------|-------------| | `moltbook_comment.csv` | 3,552,190 | 1.5 GB | 所有带回复结构的评论 | | `moltbook_post.csv` | 约790万 | 608 MB | 所有帖子(顶层提交内容) | ### 评论数据模式 | 字段名 | 数据类型 | 描述 | |--------|------|-------------| | `comment_id` | 字符串 | 评论唯一标识符(通用唯一识别码UUID) | | `post_id` | 字符串 | 所属帖子的标识符 | | `parent_id` | 字符串 | 父评论ID(若直接回复帖子则为空) | | `comment_content` | 字符串 | 评论完整文本 | | `comment_created_at` | 字符串 | 时间戳(协调世界时UTC) | | `comment_author_id` | 字符串 | 作者标识符(UUID) | | `comment_author_name` | 字符串 | 作者显示名称 | | `comment_upvotes` | 浮点型 | 点赞数 | | `comment_downvotes` | 浮点型 | 点踩数 | | `comment_depth` | 浮点型 | 嵌套深度(0表示直接回复帖子) | ### 平台核心特性 - 仅约5.7%的评论拥有非空的`parent_id`字段,这体现了Moltbook的扁平化线程结构。 - `parent_id`为空的评论为直接回复原帖的内容(`comment_depth`=0)。 - 该特性属于平台原生设计,并非数据采集过程中的人为产物。 ## 使用方法 python from datasets import load_dataset # 加载评论 comments = load_dataset("zzzztom/moltbook-forum-data", data_files="moltbook_comment.csv", split="train") # 加载帖子 posts = load_dataset("zzzztom/moltbook-forum-data", data_files="moltbook_post.csv", split="train") # 筛选出论文中使用的5个核心社区 main_communities = ["philosophy", "ponderings", "todayilearned", "ai", "builds"] # 重构回复链 nested = comments.filter(lambda x: x["parent_id"] != "") ## Reddit对比数据集 本论文将Moltbook与匹配的Reddit社区进行了对比。由于再分发限制,本次发布未包含Reddit相关数据。论文中使用的Reddit数据可从以下渠道获取: - [HuggingFaceGECLM/REDDIT_comments](https://huggingface.co/datasets/HuggingFaceGECLM/REDDIT_comments)(数据来源于Pushshift,涵盖50个Reddit子版块,时间范围为2006年至2023年) 本研究在主要分析中采用了LLM普及前的时间窗口(2018年至2021年)。 ## 引用格式 若您使用本数据集,请引用以下文献: bibtex @article{zhang2026agents, title={Do Agents Repair When Challenged---or Just Reply? Challenge, Repair, and Public Correction in a Deployed Agent Forum}, author={Zhang, Luyang and Chu, Yi-Yun and Wang, Jialu and Li, Beibei and Krishnan, Ramayya}, journal={arXiv preprint arXiv:2604.00518}, year={2026} } ## 许可证 本数据集采用[CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)许可协议发布。
提供机构:
zzzztom
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作