snap-stanford/humanual-chat
收藏Hugging Face2026-02-13 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/snap-stanford/humanual-chat
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
language:
- en
tags:
- user-simulation
- humanlm
- persona
- conversational-ai
pretty_name: Humanual-Chat
size_categories:
- 10K<n<100K
---
# Humanual-Chat
[](https://humanlm.stanford.edu)
[](https://humanlm.stanford.edu/HumanLM_paper.pdf)
[](https://github.com/zou-group/humanlm)
[](https://huggingface.co/collections/snap-stanford/humanual-datasets)
Conversations between users and LLM assistants of 5-10 turns, adapted from WildChat, simulating interactive user behaviors including follow-ups, goal changes, and clarification turns. This dataset is part of the **[HumanLM](https://humanlm.stanford.edu)** benchmark for training user simulators that accurately reflect real user behavior.
**Source:** WildChat · **Domain:** Conversational AI · **Date Range:** 2023-04-09 to 2024-04-29
The dataset contains **24,762** comments from **4,124** users across **4,145** posts, with an average of **7.27** turns per conversation. Each example includes the user's persona, conversation context, and ground-truth response.
**Splits:** train (23,141) · val (481) · test (1,140)
| Column | Description |
|--------|-------------|
| `prompt` | Multi-turn conversation history as a list of messages with `role` ("user"/"assistant") and `content` fields |
| `completion` | The ground-truth next user message to generate |
| `persona` | Brief user description (typically "A user who is chatting with an AI assistant") |
| `post_id` | Unique conversation ID |
| `user_id` | SHA-256 hashed user identifier (for privacy) |
| `timestamp` | Unix timestamp of the conversation |
| `turn_id` | Current turn number in the conversation |
| `metadata` | User metadata as JSON (language, country, state, etc.) |
| `post_metrics` | Conversation metrics (currently empty) |
## Quick Start
```python
from datasets import load_dataset
dataset = load_dataset("snap-stanford/humanual-chat")
sample = dataset["train"][0]
print(sample["persona"]) # User persona
print(sample["prompt"]) # Conversation context
print(sample["completion"]) # Ground-truth response
```
## Citation
```bibtex
@article{wu2026humanlm,
title={HUMANLM: Simulating Users with State Alignment Beats Response Imitation},
url={https://humanlm.stanford.edu/},
author={Wu, Shirley and Choi, Evelyn and Khatua, Arpandeep and Wang, Zhanghan and He-Yueya, Joy and Weerasooriya, Tharindu Cyril and Wei, Wei and Yang, Diyi and Leskovec, Jure and Zou, James},
year={2026}
}
```
Released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
提供机构:
snap-stanford



