zzzztom/moltbook-forum-data

Name: zzzztom/moltbook-forum-data
Creator: zzzztom
Published: 2026-04-02 04:34:54
License: 暂无描述

Hugging Face2026-04-02 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/zzzztom/moltbook-forum-data

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - text-classification - text-generation language: - en tags: - moltbook - llm-agents - social-norms - conversational-repair - online-communities - agent-forum - social-alignment size_categories: - 1M<n<10M pretty_name: Moltbook Forum Data --- # Moltbook Forum Data A structured snapshot of [Moltbook](https://moltbook.com), a live deployed forum where LLM agents interact with one another. This dataset contains **7.9M posts** and **3.6M comments** (including 5.7% with threaded reply structure) across **5,084 communities (submolts)**, collected between January 28 and February 17, 2026. ## Paper **Do Agents Repair When Challenged — or Just Reply? Challenge, Repair, and Public Correction in a Deployed Agent Forum** Luyang Zhang, Yi-Yun Chu, Jialu Wang, Beibei Li, Ramayya Krishnan [arXiv:2604.00518](https://arxiv.org/abs/2604.00518) ## Dataset Description | File | Rows | Size | Description | |------|------|------|-------------| | `moltbook_comment.csv` | 3,552,190 | 1.5 GB | All comments with reply structure | | `moltbook_post.csv` | ~7.9M | 608 MB | All posts (top-level submissions) | ### Comment Schema | Column | Type | Description | |--------|------|-------------| | `comment_id` | string | Unique comment identifier (UUID) | | `post_id` | string | Parent post identifier | | `parent_id` | string | Parent comment ID (empty if direct reply to post) | | `comment_content` | string | Full text of the comment | | `comment_created_at` | string | Timestamp (UTC) | | `comment_author_id` | string | Author identifier (UUID) | | `comment_author_name` | string | Author display name | | `comment_upvotes` | float | Upvote count | | `comment_downvotes` | float | Downvote count | | `comment_depth` | float | Nesting depth (0 = direct reply to post) | ### Key Platform Properties - Only ~5.7% of comments have a non-empty `parent_id`, reflecting Moltbook's flat threading structure - Comments with empty `parent_id` are direct replies to the original post (`comment_depth` = 0) - This is a platform-level property, not a data collection artifact ## Usage ```python from datasets import load_dataset # Load comments comments = load_dataset("zzzztom/moltbook-forum-data", data_files="moltbook_comment.csv", split="train") # Load posts posts = load_dataset("zzzztom/moltbook-forum-data", data_files="moltbook_post.csv", split="train") # Filter to the 5 main communities used in the paper main_communities = ["philosophy", "ponderings", "todayilearned", "ai", "builds"] # Reconstruct reply chains nested = comments.filter(lambda x: x["parent_id"] != "") ``` ## Reddit Comparison Data The paper compares Moltbook with matched Reddit communities. Reddit data is **not included** in this release due to redistribution restrictions. The Reddit data used in the paper is available from: - [HuggingFaceGECLM/REDDIT_comments](https://huggingface.co/datasets/HuggingFaceGECLM/REDDIT_comments) (Pushshift-sourced, 50 subreddits, 2006–2023) We use a pre-LLM window (2018–2021) in primary analyses. ## Citation If you use this dataset, please cite: ```bibtex @article{zhang2026agents, title={Do Agents Repair When Challenged---or Just Reply? Challenge, Repair, and Public Correction in a Deployed Agent Forum}, author={Zhang, Luyang and Chu, Yi-Yun and Wang, Jialu and Li, Beibei and Krishnan, Ramayya}, journal={arXiv preprint arXiv:2604.00518}, year={2026} } ``` ## License This dataset is released under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/).

--- 许可证：CC-BY-4.0 任务类别： - 文本分类 - 文本生成语言： - 英语标签： - Moltbook - LLM智能体（LLM-agents） - 社会规范（social-norms） - 会话修复（conversational-repair） - 在线社区（online-communities） - 智能体论坛（agent-forum） - 社会对齐（social-alignment）规模类别： - 1M<n<10M 美观名称：Moltbook论坛数据集 --- # Moltbook论坛数据集本数据集是[Moltbook](https://moltbook.com)的结构化快照，该平台是一个已上线运行的论坛，供大语言模型（LLM）智能体之间进行交互。本数据集包含2026年1月28日至2月17日期间采集的790万条帖子与360万条评论（其中5.7%采用线程式回复结构），涵盖5084个社区（submolts）。 ## 相关论文 **《智能体遭遇质疑时会修复对话，还是仅作回复？部署式智能体论坛中的质疑、修复与公开修正》** 作者：张路遥、朱亦云、王佳璐、李贝贝、Ramayya Krishnan [arXiv:2604.00518](https://arxiv.org/abs/2604.00518) ## 数据集说明 | 文件 | 行数 | 大小 | 描述 | |------|------|------|-------------| | `moltbook_comment.csv` | 3,552,190 | 1.5 GB | 所有带回复结构的评论 | | `moltbook_post.csv` | 约790万 | 608 MB | 所有帖子（顶层提交内容） | ### 评论数据模式 | 字段名 | 数据类型 | 描述 | |--------|------|-------------| | `comment_id` | 字符串 | 评论唯一标识符（通用唯一识别码UUID） | | `post_id` | 字符串 | 所属帖子的标识符 | | `parent_id` | 字符串 | 父评论ID（若直接回复帖子则为空） | | `comment_content` | 字符串 | 评论完整文本 | | `comment_created_at` | 字符串 | 时间戳（协调世界时UTC） | | `comment_author_id` | 字符串 | 作者标识符（UUID） | | `comment_author_name` | 字符串 | 作者显示名称 | | `comment_upvotes` | 浮点型 | 点赞数 | | `comment_downvotes` | 浮点型 | 点踩数 | | `comment_depth` | 浮点型 | 嵌套深度（0表示直接回复帖子） | ### 平台核心特性 - 仅约5.7%的评论拥有非空的`parent_id`字段，这体现了Moltbook的扁平化线程结构。 - `parent_id`为空的评论为直接回复原帖的内容（`comment_depth`=0）。 - 该特性属于平台原生设计，并非数据采集过程中的人为产物。 ## 使用方法 python from datasets import load_dataset # 加载评论 comments = load_dataset("zzzztom/moltbook-forum-data", data_files="moltbook_comment.csv", split="train") # 加载帖子 posts = load_dataset("zzzztom/moltbook-forum-data", data_files="moltbook_post.csv", split="train") # 筛选出论文中使用的5个核心社区 main_communities = ["philosophy", "ponderings", "todayilearned", "ai", "builds"] # 重构回复链 nested = comments.filter(lambda x: x["parent_id"] != "") ## Reddit对比数据集本论文将Moltbook与匹配的Reddit社区进行了对比。由于再分发限制，本次发布未包含Reddit相关数据。论文中使用的Reddit数据可从以下渠道获取： - [HuggingFaceGECLM/REDDIT_comments](https://huggingface.co/datasets/HuggingFaceGECLM/REDDIT_comments)（数据来源于Pushshift，涵盖50个Reddit子版块，时间范围为2006年至2023年）本研究在主要分析中采用了LLM普及前的时间窗口（2018年至2021年）。 ## 引用格式若您使用本数据集，请引用以下文献： bibtex @article{zhang2026agents, title={Do Agents Repair When Challenged---or Just Reply? Challenge, Repair, and Public Correction in a Deployed Agent Forum}, author={Zhang, Luyang and Chu, Yi-Yun and Wang, Jialu and Li, Beibei and Krishnan, Ramayya}, journal={arXiv preprint arXiv:2604.00518}, year={2026} } ## 许可证本数据集采用[CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)许可协议发布。

提供机构：

zzzztom

5,000+

优质数据集

54 个

任务类型

进入经典数据集