mutual_friends
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/stanfordnlp/mutual_friends
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for MutualFriends
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [COCOA](https://stanfordnlp.github.io/cocoa/)
- **Repository:** [Github repository](https://github.com/stanfordnlp/cocoa)
- **Paper:** [Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings (ACL 2017)](https://arxiv.org/abs/1704.07130)
- **Codalab**: [Codalab](https://worksheets.codalab.org/worksheets/0xc757f29f5c794e5eb7bfa8ca9c945573/)
### Dataset Summary
Our goal is to build systems that collaborate with people by exchanging information through natural language and reasoning over structured knowledge base. In the MutualFriend task, two agents, A and B, each have a private knowledge base, which contains a list of friends with multiple attributes (e.g., name, school, major, etc.). The agents must chat with each other to find their unique mutual friend.
### Supported Tasks and Leaderboards
We consider two agents, each with a private knowledge base of items, who must communicate their knowledge to achieve a common goal. Specifically, we designed the MutualFriends task (see the figure below). Each agent has a list of friends with attributes like school, major etc. They must chat with each other to find the unique mutual friend.
### Languages
The text in the dataset is in English. The associated BCP-47 code is `en`.
## Dataset Structure
### Data Instances
An example looks like this.
```
{
'uuid': 'C_423324a5fff045d78bef75a6f295a3f4'
'scenario_uuid': 'S_hvmRM4YNJd55ecT5',
'scenario_alphas': [0.30000001192092896, 1.0, 1.0],
'scenario_attributes': {
'name': ['School', 'Company', 'Location Preference'],
'unique': [False, False, False],
'value_type': ['school', 'company', 'loc_pref']
},
'scenario_kbs': [
[
[['School', 'Company', 'Location Preference'], ['Longwood College', 'Alton Steel', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Salisbury State University', 'Leonard Green & Partners', 'indoor']],
[['School', 'Company', 'Location Preference'], ['New Mexico Highlands University', 'Crazy Eddie', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Rhodes College', "Tully's Coffee", 'indoor']],
[['School', 'Company', 'Location Preference'], ['Sacred Heart University', 'AMR Corporation', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Salisbury State University', 'Molycorp', 'indoor']],
[['School', 'Company', 'Location Preference'], ['New Mexico Highlands University', 'The Hartford Financial Services Group', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Sacred Heart University', 'Molycorp', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Babson College', 'The Hartford Financial Services Group', 'indoor']]
],
[
[['School', 'Company', 'Location Preference'], ['National Technological University', 'Molycorp', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Fairmont State College', 'Leonard Green & Partners', 'outdoor']],
[['School', 'Company', 'Location Preference'], ['Johnson C. Smith University', 'Data Resources Inc.', 'outdoor']],
[['School', 'Company', 'Location Preference'], ['Salisbury State University', 'Molycorp', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Fairmont State College', 'Molycorp', 'outdoor']],
[['School', 'Company', 'Location Preference'], ['University of South Carolina - Aiken', 'Molycorp', 'indoor']],
[['School', 'Company', 'Location Preference'], ['University of South Carolina - Aiken', 'STX', 'outdoor']],
[['School', 'Company', 'Location Preference'], ['National Technological University', 'STX', 'outdoor']],
[['School', 'Company', 'Location Preference'], ['Johnson C. Smith University', 'Rockstar Games', 'indoor']]
]
],
'agents': {
'0': 'human',
'1': 'human'
},
'outcome_reward': 1,
'events': {
'actions': ['message', 'message', 'message', 'message', 'select', 'select'],
'agents': [1, 1, 0, 0, 1, 0],
'data_messages': ['Hello', 'Do you know anyone who works at Molycorp?', 'Hi. All of my friends like the indoors.', 'Ihave two friends that work at Molycorp. They went to Salisbury and Sacred Heart.', '', ''],
'data_selects': {
'attributes': [
[], [], [], [], ['School', 'Company', 'Location Preference'], ['School', 'Company', 'Location Preference']
],
'values': [
[], [], [], [], ['Salisbury State University', 'Molycorp', 'indoor'], ['Salisbury State University', 'Molycorp', 'indoor']
]
},
'start_times': [-1.0, -1.0, -1.0, -1.0, -1.0, -1.0],
'times': [1480737280.0, 1480737280.0, 1480737280.0, 1480737280.0, 1480737280.0, 1480737280.0]
},
}
```
### Data Fields
- `uuid`: example id.
- `scenario_uuid`: scenario id.
- `scenario_alphas`: scenario alphas.
- `scenario_attributes`: all the attributes considered in the scenario. The dictionaries are liniearized: to reconstruct the dictionary of attribute i-th, one should extract the i-th elements of `unique`, `value_type` and `name`.
- `unique`: bool.
- `value_type`: code/type of the attribute.
- `name`: name of the attribute.
- `scenario_kbs`: descriptions of the persons present in the two users' databases. List of two (one for each user in the dialogue). `scenario_kbs[i]` is a list of persons. Each person is represented as two lists (one for attribute names and the other for attribute values). The j-th element of attribute names corresponds to the j-th element of attribute values (linearized dictionary).
- `agents`: the two users engaged in the dialogue.
- `outcome_reward`: reward of the present dialogue.
- `events`: dictionary describing the dialogue. The j-th element of each sub-element of the dictionary describes the turn along the axis of the sub-element.
- `actions`: type of turn (either `message` or `select`).
- `agents`: who is talking? Agent 1 or 0?
- `data_messages`: the string exchanged if `action==message`. Otherwise, empty string.
- `data_selects`: selection of the user if `action==select`. Otherwise, empty selection/dictionary.
- `start_times`: always -1 in these data.
- `times`: sending time.
### Data Splits
There are 8967 dialogues for training, 1083 for validation and 1107 for testing.
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
[More Information Needed]
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
[More Information Needed]
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
```
@inproceedings{he-etal-2017-learning,
title = "Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings",
author = "He, He and
Balakrishnan, Anusha and
Eric, Mihail and
Liang, Percy",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2017",
address = "Vancouver, Canada",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P17-1162",
doi = "10.18653/v1/P17-1162",
pages = "1766--1776",
abstract = "We study a \textit{symmetric collaborative dialogue} setting in which two agents, each with private knowledge, must strategically communicate to achieve a common goal. The open-ended dialogue state in this setting poses new challenges for existing dialogue systems. We collected a dataset of 11K human-human dialogues, which exhibits interesting lexical, semantic, and strategic elements. To model both structured knowledge and unstructured language, we propose a neural model with dynamic knowledge graph embeddings that evolve as the dialogue progresses. Automatic and human evaluations show that our model is both more effective at achieving the goal and more human-like than baseline neural and rule-based models.",
}
```
### Contributions
Thanks to [@VictorSanh](https://github.com/VictorSanh) for adding this dataset.
# 数据集卡片:MutualFriends
## 目录
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## 数据集概述
- **主页**:[COCOA](https://stanfordnlp.github.io/cocoa/)
- **代码仓库**:[GitHub仓库](https://github.com/stanfordnlp/cocoa)
- **论文**:[《基于动态知识图谱嵌入(Dynamic Knowledge Graph Embeddings)的对称式协作对话智能体(Symmetric Collaborative Dialogue Agent)学习》,国际计算语言学协会(Association for Computational Linguistics, ACL)2017年大会](https://arxiv.org/abs/1704.07130)
- **Codalab平台**:[Codalab](https://worksheets.codalab.org/worksheets/0xc757f29f5c794e5eb7bfa8ca9c945573/)
### 数据集摘要
我们的目标是构建能够通过自然语言交换信息并基于结构化知识库(Structured Knowledge Base)进行推理,从而与人类协作的系统。在MutualFriends任务中,两名智能体(Agent)A与B各自拥有一份私有知识库(Knowledge Base),其中包含一批带有多项属性(如姓名、学校、专业等)的好友信息。智能体必须通过互相聊天,找到二者唯一的共同好友。
### 支持任务与排行榜
我们针对两名各持有私有物品知识库的智能体场景展开研究,二者需要通过交流自身掌握的知识来达成共同目标。具体而言,我们设计了MutualFriends任务(详见下图)。每名智能体均拥有一份带有学校、专业等属性的好友列表,二者必须通过聊天交互来找到唯一的共同好友。
### 语言
数据集文本语言为英语,对应的BCP-47代码为`en`。
## 数据集结构
### 数据实例
数据实例如以下所示:
{
'uuid': 'C_423324a5fff045d78bef75a6f295a3f4'
'scenario_uuid': 'S_hvmRM4YNJd55ecT5',
'scenario_alphas': [0.30000001192092896, 1.0, 1.0],
'scenario_attributes': {
'name': ['School', 'Company', 'Location Preference'],
'unique': [False, False, False],
'value_type': ['school', 'company', 'loc_pref']
},
'scenario_kbs': [
[
[['School', 'Company', 'Location Preference'], ['Longwood College', 'Alton Steel', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Salisbury State University', 'Leonard Green & Partners', 'indoor']],
[['School', 'Company', 'Location Preference'], ['New Mexico Highlands University', 'Crazy Eddie', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Rhodes College', "Tully's Coffee", 'indoor']],
[['School', 'Company', 'Location Preference'], ['Sacred Heart University', 'AMR Corporation', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Salisbury State University', 'Molycorp', 'indoor']],
[['School', 'Company', 'Location Preference'], ['New Mexico Highlands University', 'The Hartford Financial Services Group', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Sacred Heart University', 'Molycorp', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Babson College', 'The Hartford Financial Services Group', 'indoor']]
],
[
[['School', 'Company', 'Location Preference'], ['National Technological University', 'Molycorp', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Fairmont State College', 'Leonard Green & Partners', 'outdoor']],
[['School', 'Company', 'Location Preference'], ['Johnson C. Smith University', 'Data Resources Inc.', 'outdoor']],
[['School', 'Company', 'Location Preference'], ['Salisbury State University', 'Molycorp', 'indoor']],
[['School', 'Company', 'Location Preference'], ['Fairmont State College', 'Molycorp', 'outdoor']],
[['School', 'Company', 'Location Preference'], ['University of South Carolina - Aiken', 'Molycorp', 'indoor']],
[['School', 'Company', 'Location Preference'], ['University of South Carolina - Aiken', 'STX', 'outdoor']],
[['School', 'Company', 'Location Preference'], ['National Technological University', 'STX', 'outdoor']],
[['School', 'Company', 'Location Preference'], ['Johnson C. Smith University', 'Rockstar Games', 'indoor']]
]
],
'agents': {
'0': 'human',
'1': 'human'
},
'outcome_reward': 1,
'events': {
'actions': ['message', 'message', 'message', 'message', 'select', 'select'],
'agents': [1, 1, 0, 0, 1, 0],
'data_messages': ['Hello', 'Do you know anyone who works at Molycorp?', 'Hi. All of my friends like the indoors.', 'Ihave two friends that work at Molycorp. They went to Salisbury and Sacred Heart.', '', ''],
'data_selects': {
'attributes': [
[], [], [], [], ['School', 'Company', 'Location Preference'], ['School', 'Company', 'Location Preference']
],
'values': [
[], [], [], [], ['Salisbury State University', 'Molycorp', 'indoor'], ['Salisbury State University', 'Molycorp', 'indoor']
]
},
'start_times': [-1.0, -1.0, -1.0, -1.0, -1.0, -1.0],
'times': [1480737280.0, 1480737280.0, 1480737280.0, 1480737280.0, 1480737280.0, 1480737280.0]
},
}
### 数据字段
- `uuid`:示例ID。
- `scenario_uuid`:场景ID。
- `scenario_alphas`:场景参数α。
- `scenario_attributes`:场景中涉及的所有属性。该字典已线性化:若要重构第i个属性的字典,需分别提取`unique`、`value_type`与`name`的第i个元素。
- `unique`:布尔值。
- `value_type`:属性的编码/类型。
- `name`:属性名称。
- `scenario_kbs`:两名对话用户的个人数据库中所包含的人物描述。该列表包含两个元素(分别对应对话中的两名用户)。`scenario_kbs[i]`为人物列表,每个人物由两个列表表示(分别为属性名称列表与属性值列表)。属性名称的第j个元素与属性值的第j个元素一一对应(线性化字典)。
- `agents`:参与对话的两名用户。
- `outcome_reward`:当前对话的奖励值。
- `events`:描述对话过程的字典。该字典的每个子元素的第j个元素,对应对话第j轮的对应信息。
- `actions`:对话轮次类型(可选`message`或`select`)。
- `agents`:发言者(智能体0或智能体1)。
- `data_messages`:当`action==message`时,为交换的文本字符串;否则为空字符串。
- `data_selects`:当`action==select`时,为用户的选择内容;否则为空选择/空字典。
- `start_times`:此类数据中该值恒为-1。
- `times`:消息发送时间。
### 数据划分
训练集包含8967条对话,验证集包含1083条,测试集包含1107条。
## 数据集构建
### 构建依据
[需补充更多信息]
### 源数据
[需补充更多信息]
#### 初始数据收集与标准化
[需补充更多信息]
#### 源语言生成者是谁?
[需补充更多信息]
### 标注
[需补充更多信息]
#### 标注流程
[需补充更多信息]
#### 标注者是谁?
[需补充更多信息]
### 个人与敏感信息
[需补充更多信息]
## 数据使用注意事项
### 数据集的社会影响
[需补充更多信息]
### 偏差讨论
[需补充更多信息]
### 其他已知局限性
[需补充更多信息]
## 附加信息
### 数据集维护者
[需补充更多信息]
### 许可信息
[需补充更多信息]
### 引用信息
@inproceedings{he-etal-2017-learning,
title = "Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings",
author = "He, He and
Balakrishnan, Anusha and
Eric, Mihail and
Liang, Percy",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2017",
address = "Vancouver, Canada",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P17-1162",
doi = "10.18653/v1/P17-1162",
pages = "1766--1776",
abstract = "We study a extit{symmetric collaborative dialogue} setting in which two agents, each with private knowledge, must strategically communicate to achieve a common goal. The open-ended dialogue state in this setting poses new challenges for existing dialogue systems. We collected a dataset of 11K human-human dialogues, which exhibits interesting lexical, semantic, and strategic elements. To model both structured knowledge and unstructured language, we propose a neural model with dynamic knowledge graph embeddings that evolve as the dialogue progresses. Automatic and human evaluations show that our model is both more effective at achieving the goal and more human-like than baseline neural and rule-based models.",
}
### 贡献
感谢[@VictorSanh](https://github.com/VictorSanh) 为本数据集添加相关内容。
提供机构:
maas
创建时间:
2025-10-03



