ai_society_translated
收藏魔搭社区2026-01-08 更新2025-09-06 收录
下载链接:
https://modelscope.cn/datasets/camel-ai/ai_society_translated
下载链接
链接失效反馈官方服务:
资源简介:
# **CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society**
- **Github:** https://github.com/lightaime/camel
- **Website:** https://www.camel-ai.org/
- **Arxiv Paper:** https://arxiv.org/abs/2303.17760
## Dataset Summary
The original AI Society dataset is in English and is composed of 25K conversations between two gpt-3.5-turbo agents. The dataset is obtained by running role-playing for a combination of 50 user roles and 50 assistant roles with each combination running over 10 tasks.
We provide translated versions of the original English dataset into ten languages: Arabic, Chinese, Korean, Japanese, Hindi, Russian, Spanish, French, German, and Italian in ".zip" format.
The dataset was translated by a prompting gpt-3.5-turbo to translate presented sentences into a particular language.
**Note:** Sometimes gpt decides not to translate particular keywords such as "Instruction", "Input", and "Solution". Therefore, cleaning might be needed depended on your use case.
## Data Fields
**The data fields for chat format (`ai_society_chat_{language}.zip`) are as follows:**
* `input`: {assistant\_role\_index}\_{user\_role\_index}\_{task\_index}, for example 001_002_003 refers to assistant role 1, user role 2, and task 3 from our text assistant role names, user role names and task text files.
* `role_1`: assistant role
* `role_2`: user role
* `original_task`: the general assigned task for the assistant and user to cooperate on.
* `specified_task`: the task after task specifier, this task is more specific than the original task.
* `message_k`: refers to the k<sup>_th_</sup> message of the conversation.
* `role_type`: refers to whether the agent is an assistant or a user.
* `role_name`: refers to the assigned assistant/user role.
* `role`: refers to the role of the agent during the message for openai api. [usually not needed]
* `content`: refers to the content of the message.
* `termination_reason`: refers to the reason of termination of the chat.
* `num_messages`: refers to the total number of messages in the chat.
**Download in python**
```
from huggingface_hub import hf_hub_download
# replace {language} by one of the following: ar, zh, ko, ja, hi, ru, es, fr, de, it
hf_hub_download(repo_id="camel-ai/ai_society_translated", repo_type="dataset", filename="ai_society_chat_{language}.zip",
local_dir="datasets/", local_dir_use_symlinks=False)
```
### Citation
```
@misc{li2023camel,
title={CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society},
author={Guohao Li and Hasan Abed Al Kader Hammoud and Hani Itani and Dmitrii Khizbullin and Bernard Ghanem},
year={2023},
eprint={2303.17760},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
```
## Disclaimer:
This data was synthetically generated by gpt-3.5-turbo and might contain incorrect information. The dataset is there only for research purposes.
---
license: cc-by-nc-4.0
---
# **CAMEL:面向大规模语言模型社群心智探索的对话式智能体(Communicative Agents for "Mind" Exploration of Large Scale Language Model Society)**
- **Github:** https://github.com/lightaime/camel
- **官方网站:** https://www.camel-ai.org/
- **Arxiv论文:** https://arxiv.org/abs/2303.17760
## 数据集概述
原AI社群数据集为英文版本,由25000轮GPT-3.5-turbo智能体间的对话组成。该数据集通过让50种用户角色与50种助手角色开展角色扮演交互生成,每种角色组合需完成超过10项任务。
我们提供了原始英文数据集的10种语言翻译版本,涵盖阿拉伯语、中文、韩语、日语、印地语、俄语、西班牙语、法语、德语及意大利语,均以".zip"格式封装。
该数据集的翻译通过提示GPT-3.5-turbo将原文语句翻译至目标语言完成。
**注意:** 部分关键词如"Instruction""Input"及"Solution"有时不会被GPT翻译,因此根据你的使用场景,可能需要进行数据清洗。
## 数据字段
**聊天格式数据集(`ai_society_chat_{language}.zip`)的数据字段如下:**
* `input`:格式为`{assistant_role_index}_{user_role_index}_{task_index}`,例如`001_002_003`代表对应助手角色名称、用户角色名称及任务文本文件中的第1号助手角色、第2号用户角色及第3号任务
* `role_1`:助手角色
* `role_2`:用户角色
* `original_task`:分配给助手与用户协作完成的通用任务
* `specified_task`:经过任务说明细化后的具体任务,相较于原始任务更为明确
* `message_k`:代表对话的第k条消息
* `role_type`:标识智能体的身份为助手或用户
* `role_name`:分配的助手/用户角色名称
* `role`:用于OpenAI API的智能体角色标识,通常无需使用
* `content`:消息内容
* `termination_reason`:对话终止的原因
* `num_messages`:对话中的总消息数
## Python 下载代码
from huggingface_hub import hf_hub_download
# 将 {language} 替换为以下任一语言代码:ar, zh, ko, ja, hi, ru, es, fr, de, it
hf_hub_download(repo_id="camel-ai/ai_society_translated", repo_type="dataset", filename="ai_society_chat_{language}.zip",
local_dir="datasets/", local_dir_use_symlinks=False)
### 引用格式
@misc{li2023camel,
title={CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society},
author={Guohao Li and Hasan Abed Al Kader Hammoud and Hani Itani and Dmitrii Khizbullin and Bernard Ghanem},
year={2023},
eprint={2303.17760},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
## 免责声明
本数据集由GPT-3.5-turbo合成生成,可能包含错误信息,仅用于科研用途。
---
license: cc-by-nc-4.0
---
提供机构:
maas
创建时间:
2025-09-04



