snap-stanford/humanual-chat

Name: snap-stanford/humanual-chat
Creator: snap-stanford
Published: 2026-02-13 04:25:23
License: 暂无描述

Hugging Face2026-02-13 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/snap-stanford/humanual-chat

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 language: - en tags: - user-simulation - humanlm - persona - conversational-ai pretty_name: Humanual-Chat size_categories: - 10K<n<100K --- # Humanual-Chat [![Website](https://img.shields.io/badge/Website-humanlm.stanford.edu-blue)](https://humanlm.stanford.edu) [![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://humanlm.stanford.edu/HumanLM_paper.pdf) [![GitHub](https://img.shields.io/badge/GitHub-Code-black)](https://github.com/zou-group/humanlm) [![Collection](https://img.shields.io/badge/HuggingFace-All_Datasets-yellow)](https://huggingface.co/collections/snap-stanford/humanual-datasets) Conversations between users and LLM assistants of 5-10 turns, adapted from WildChat, simulating interactive user behaviors including follow-ups, goal changes, and clarification turns. This dataset is part of the **[HumanLM](https://humanlm.stanford.edu)** benchmark for training user simulators that accurately reflect real user behavior. **Source:** WildChat · **Domain:** Conversational AI · **Date Range:** 2023-04-09 to 2024-04-29 The dataset contains **24,762** comments from **4,124** users across **4,145** posts, with an average of **7.27** turns per conversation. Each example includes the user's persona, conversation context, and ground-truth response. **Splits:** train (23,141) · val (481) · test (1,140) | Column | Description | |--------|-------------| | `prompt` | Multi-turn conversation history as a list of messages with `role` ("user"/"assistant") and `content` fields | | `completion` | The ground-truth next user message to generate | | `persona` | Brief user description (typically "A user who is chatting with an AI assistant") | | `post_id` | Unique conversation ID | | `user_id` | SHA-256 hashed user identifier (for privacy) | | `timestamp` | Unix timestamp of the conversation | | `turn_id` | Current turn number in the conversation | | `metadata` | User metadata as JSON (language, country, state, etc.) | | `post_metrics` | Conversation metrics (currently empty) | ## Quick Start ```python from datasets import load_dataset dataset = load_dataset("snap-stanford/humanual-chat") sample = dataset["train"][0] print(sample["persona"]) # User persona print(sample["prompt"]) # Conversation context print(sample["completion"]) # Ground-truth response ``` ## Citation ```bibtex @article{wu2026humanlm, title={HUMANLM: Simulating Users with State Alignment Beats Response Imitation}, url={https://humanlm.stanford.edu/}, author={Wu, Shirley and Choi, Evelyn and Khatua, Arpandeep and Wang, Zhanghan and He-Yueya, Joy and Weerasooriya, Tharindu Cyril and Wei, Wei and Yang, Diyi and Leskovec, Jure and Zou, James}, year={2026} } ``` Released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).

提供机构：

snap-stanford

5,000+

优质数据集

54 个

任务类型

进入经典数据集