five

seantw/DEBATE_LLM

收藏
Hugging Face2025-11-19 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/seantw/DEBATE_LLM
下载链接
链接失效反馈
官方服务:
资源简介:
# DEBATE Submit Data – Human Conversation Datasets This repository contains CSV files from the DEBATE project: large-scale human conversation experiments organized around controversial and opinion-based topics. The data consists of multi-round conversations between participants discussing political, social, and belief-related topics, following the protocol described in: > Chuang et al., **“DEBATE: A Large-Scale Benchmark for Role-Playing LLM Agents in Multi-Agent, Long-Form Debates”**, > 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Scaling Environments for Agents (SEA), > arXiv preprint arXiv:2510.25110, 2025. Paper Link: https://arxiv.org/pdf/2510.25110 ## Directory Structure ```text submit_data/ ├── depth/ # Phase 1: Depth topics (fewer topics, more conversations each) │ ├── [topic_name]/ # Individual topic folders │ │ ├── *.csv # Conversation data files │ │ └── ... │ └── ... ├── breadth/ # Phase 2: Breadth topics (many topics, fewer conversations each) │ ├── [topic_name]/ # Individual topic folders │ │ ├── *.csv # Conversation data files │ │ └── ... │ └── ... └── README.md # This file ``` ## Data Organization ### Depth vs Breadth - **Depth Topics**: Focused exploration of a smaller set of topics with multiple conversation sessions per topic - **Breadth Topics**: Broad coverage across many different topics with fewer sessions per topic ### Topic Categories The conversations cover a wide range of controversial and opinion-based topics including: #### Depth Topics (Phase 1) - For more information on topics, check Appendix. #### Breadth Topics (Phase 2) - For more information on topics, check Appendix. ## File Naming Convention Each CSV file follows this naming pattern: ``` YYYYMMDD_HHMMSS_TOPIC_NAME_UNIQUE_ID_VERSION.csv ``` Where: - `YYYYMMDD`: Date (Year/Month/Day) - `HHMMSS`: Time (Hour/Minute/Second) - `TOPIC_NAME`: Underscored topic description - `UNIQUE_ID`: 26-character unique identifier - `VERSION`: Version number (e.g., "0.0.1") ## Data File Structure Each CSV file contains conversation data with the following key columns: - **Event tracking**: `event_order`, `event_type` - **Participants**: `worker_id`, `sender_id`, `recipient_id` - **Content**: `text` (messages, opinions, slider values) - **Conversation flow**: `chat_round_order`, `message_id` - **User interaction**: `is_slider_changed` (opinion rating changes) ### Event Types - `Initial Opinion`: Participant's starting position on the topic - `tweet`: Short messages during conversation - `message_sent`/`message_received`: Direct messages between participants ### Special Notation - `[SLIDER_VALUE=X]`: Indicates participant's opinion rating (typically 1-5 scale) - `[AUTOSUBMISSION DUE TO TIME LIMIT]`: System-generated due to timeout ## Data Usage This dataset is suitable for research on: - Opinion dynamics and persuasion - Human-AI conversation patterns - Political and social belief systems - Argumentation and debate analysis - Consensus building in controversial topics ## Data Quality - Files contain real human conversation data - Some conversations may be incomplete due to participant dropout - Time limits may have caused automatic submissions - **Processed data may contain empty rows**: Consecutive messages from the same user are concatenated and treated as a single message, which can result in empty rows in the processed dataset ## License & Usage Restrictions This dataset is released under the: DEBATE Dataset Research-Only License (Non-Commercial, v1.0) (see the LICENSE file in this repository). ## Citation Please cite the following work when using this dataset in your research: ``` @article{chuang2025debate, title={DEBATE: A Large-Scale Benchmark for Role-Playing LLM Agents in Multi-Agent, Long-Form Debates}, author={Chuang, Yun-Shiuan and Tu, Ruixuan and Dai, Chengtao and Vasani, Smit and Yao, Binwei and Tessler, Michael Henry and Yang, Sijia and Shah, Dhavan V and Hawkins, Robert D and Hu, Junjie and others}, journal={arXiv preprint arXiv:2510.25110}, year={2025} } ```
提供机构:
seantw
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作