seantw/DEBATE_LLM
收藏Hugging Face2025-11-19 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/seantw/DEBATE_LLM
下载链接
链接失效反馈官方服务:
资源简介:
# DEBATE Submit Data – Human Conversation Datasets
This repository contains CSV files from the DEBATE project: large-scale
human conversation experiments organized around controversial and
opinion-based topics. The data consists of multi-round conversations
between participants discussing political, social, and belief-related
topics, following the protocol described in:
> Chuang et al., **“DEBATE: A Large-Scale Benchmark for Role-Playing LLM Agents in Multi-Agent, Long-Form Debates”**,
> 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Scaling Environments for Agents (SEA),
> arXiv preprint arXiv:2510.25110, 2025.
Paper Link: https://arxiv.org/pdf/2510.25110
## Directory Structure
```text
submit_data/
├── depth/ # Phase 1: Depth topics (fewer topics, more conversations each)
│ ├── [topic_name]/ # Individual topic folders
│ │ ├── *.csv # Conversation data files
│ │ └── ...
│ └── ...
├── breadth/ # Phase 2: Breadth topics (many topics, fewer conversations each)
│ ├── [topic_name]/ # Individual topic folders
│ │ ├── *.csv # Conversation data files
│ │ └── ...
│ └── ...
└── README.md # This file
```
## Data Organization
### Depth vs Breadth
- **Depth Topics**: Focused exploration of a smaller set of topics with multiple conversation sessions per topic
- **Breadth Topics**: Broad coverage across many different topics with fewer sessions per topic
### Topic Categories
The conversations cover a wide range of controversial and opinion-based topics including:
#### Depth Topics (Phase 1)
- For more information on topics, check Appendix.
#### Breadth Topics (Phase 2)
- For more information on topics, check Appendix.
## File Naming Convention
Each CSV file follows this naming pattern:
```
YYYYMMDD_HHMMSS_TOPIC_NAME_UNIQUE_ID_VERSION.csv
```
Where:
- `YYYYMMDD`: Date (Year/Month/Day)
- `HHMMSS`: Time (Hour/Minute/Second)
- `TOPIC_NAME`: Underscored topic description
- `UNIQUE_ID`: 26-character unique identifier
- `VERSION`: Version number (e.g., "0.0.1")
## Data File Structure
Each CSV file contains conversation data with the following key columns:
- **Event tracking**: `event_order`, `event_type`
- **Participants**: `worker_id`, `sender_id`, `recipient_id`
- **Content**: `text` (messages, opinions, slider values)
- **Conversation flow**: `chat_round_order`, `message_id`
- **User interaction**: `is_slider_changed` (opinion rating changes)
### Event Types
- `Initial Opinion`: Participant's starting position on the topic
- `tweet`: Short messages during conversation
- `message_sent`/`message_received`: Direct messages between participants
### Special Notation
- `[SLIDER_VALUE=X]`: Indicates participant's opinion rating (typically 1-5 scale)
- `[AUTOSUBMISSION DUE TO TIME LIMIT]`: System-generated due to timeout
## Data Usage
This dataset is suitable for research on:
- Opinion dynamics and persuasion
- Human-AI conversation patterns
- Political and social belief systems
- Argumentation and debate analysis
- Consensus building in controversial topics
## Data Quality
- Files contain real human conversation data
- Some conversations may be incomplete due to participant dropout
- Time limits may have caused automatic submissions
- **Processed data may contain empty rows**: Consecutive messages from the same user are concatenated and treated as a single message, which can result in empty rows in the processed dataset
## License & Usage Restrictions
This dataset is released under the:
DEBATE Dataset Research-Only License (Non-Commercial, v1.0)
(see the LICENSE file in this repository).
## Citation
Please cite the following work when using this dataset in your research:
```
@article{chuang2025debate,
title={DEBATE: A Large-Scale Benchmark for Role-Playing LLM Agents in Multi-Agent, Long-Form Debates},
author={Chuang, Yun-Shiuan and Tu, Ruixuan and Dai, Chengtao and Vasani, Smit and Yao, Binwei and Tessler, Michael Henry and Yang, Sijia and Shah, Dhavan V and Hawkins, Robert D and Hu, Junjie and others},
journal={arXiv preprint arXiv:2510.25110},
year={2025}
}
```
提供机构:
seantw



