five

snap-stanford/humanual-opinion

收藏
Hugging Face2026-02-13 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/snap-stanford/humanual-opinion
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 language: - en tags: - user-simulation - humanlm - persona - social-media-and-opinion pretty_name: Humanual-Opinion size_categories: - 10K<n<100K --- # Humanual-Opinion [![Website](https://img.shields.io/badge/Website-humanlm.stanford.edu-blue)](https://humanlm.stanford.edu) [![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://humanlm.stanford.edu/HumanLM_paper.pdf) [![GitHub](https://img.shields.io/badge/GitHub-Code-black)](https://github.com/zou-group/humanlm) [![Collection](https://img.shields.io/badge/HuggingFace-All_Datasets-yellow)](https://huggingface.co/collections/snap-stanford/humanual-datasets) Reddit users expressing opinions across diverse personal-issue threads (r/AITA), reflecting moral standards on controversial topics like family conflicts and life decisions. This dataset is part of the **[HumanLM](https://humanlm.stanford.edu)** benchmark for training user simulators that accurately reflect real user behavior. **Source:** Reddit r/AITA via asyncpraw · **Domain:** Social Media & Opinion · **Date Range:** 2018-11-12 to 2025-09-08 The dataset contains **42,332** comments from **4,567** users across **992** posts, with an average of **3.55** turns per conversation. Each example includes the user's persona, conversation context, and ground-truth response. **Splits:** train (37,791) · val (1,177) · test (3,364) | Column | Description | |--------|-------------| | `prompt` | Reddit post and parent comments as a list of messages with `role` and `content` fields | | `completion` | The ground-truth user comment to generate | | `persona` | User's commenting history and stance patterns on r/AITA | | `post_id` | Reddit post ID | | `user_id` | SHA-256 hashed Reddit username (for privacy) | | `timestamp` | Unix timestamp of when the comment was posted | | `turn_id` | Depth in the comment thread (1 = direct reply to post) | | `metadata` | Reddit metadata as JSON (subreddit, score, awards, etc.) | ## Quick Start ```python from datasets import load_dataset dataset = load_dataset("snap-stanford/humanual-opinion") sample = dataset["train"][0] print(sample["persona"]) # User persona print(sample["prompt"]) # Conversation context print(sample["completion"]) # Ground-truth response ``` ## Citation ```bibtex @article{wu2026humanlm, title={HUMANLM: Simulating Users with State Alignment Beats Response Imitation}, url={https://humanlm.stanford.edu/}, author={Wu, Shirley and Choi, Evelyn and Khatua, Arpandeep and Wang, Zhanghan and He-Yueya, Joy and Weerasooriya, Tharindu Cyril and Wei, Wei and Yang, Diyi and Leskovec, Jure and Zou, James}, year={2026} } ``` Released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
提供机构:
snap-stanford
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作