coached_conv_pref

Name: coached_conv_pref
Creator: maas
Published: 2025-12-05 16:41:04
License: 暂无描述

魔搭社区2025-12-05 更新2025-07-12 收录

下载链接：

https://modelscope.cn/datasets/google-research-datasets/coached_conv_pref

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for Coached Conversational Preference Elicitation ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [Coached Conversational Preference Elicitation Homepage](https://research.google/tools/datasets/coached-conversational-preference-elicitation/) - **Repository:** [Coached Conversational Preference Elicitation Repository](https://github.com/google-research-datasets/ccpe) - **Paper:** [Aclweb](https://www.aclweb.org/anthology/W19-5941/) ### Dataset Summary A dataset consisting of 502 English dialogs with 12,000 annotated utterances between a user and an assistant discussing movie preferences in natural language. It was collected using a Wizard-of-Oz methodology between two paid crowd-workers, where one worker plays the role of an 'assistant', while the other plays the role of a 'user'. The 'assistant' elicits the 'user’s' preferences about movies following a Coached Conversational Preference Elicitation (CCPE) method. The assistant asks questions designed to minimize the bias in the terminology the 'user' employs to convey his or her preferences as much as possible, and to obtain these preferences in natural language. Each dialog is annotated with entity mentions, preferences expressed about entities, descriptions of entities provided, and other statements of entities. ### Supported Tasks and Leaderboards * `other-other-Conversational Recommendation`: The dataset can be used to train a model for Conversational recommendation, which consists in Coached Conversation Preference Elicitation. ### Languages The text in the dataset is in English. The associated BCP-47 code is `en`. ## Dataset Structure ### Data Instances A typical data point comprises of a series of utterances between the 'assistant' and the 'user'. Each such utterance is annotated into categories mentioned in data fields. An example from the Coached Conversational Preference Elicitation dataset looks as follows: ``` {'conversationId': 'CCPE-6faee', 'utterances': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'segments': [{'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [0], 'entityType': [0]}, {'annotationType': [1], 'entityType': [0]}], 'endIndex': [20, 27], 'startIndex': [14, 0], 'text': ['comedy', 'I really like comedy movies']}, {'annotations': [{'annotationType': [0], 'entityType': [0]}], 'endIndex': [24], 'startIndex': [16], 'text': ['comedies']}, {'annotations': [{'annotationType': [1], 'entityType': [0]}], 'endIndex': [15], 'startIndex': [0], 'text': ['I love to laugh']}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [0], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [21, 21], 'startIndex': [8, 0], 'text': ['Step Brothers', 'I liked Step Brothers']}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [1], 'entityType': [1]}], 'endIndex': [32], 'startIndex': [0], 'text': ['Had some amazing one-liners that']}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [0], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [15, 15], 'startIndex': [13, 0], 'text': ['RV', "I don't like RV"]}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [1], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [48, 66], 'startIndex': [18, 50], 'text': ['It was just so slow and boring', "I didn't like it"]}, {'annotations': [{'annotationType': [0], 'entityType': [1]}], 'endIndex': [63], 'startIndex': [33], 'text': ['Jurassic World: Fallen Kingdom']}, {'annotations': [{'annotationType': [0], 'entityType': [1]}, {'annotationType': [3], 'entityType': [1]}], 'endIndex': [52, 52], 'startIndex': [22, 0], 'text': ['Jurassic World: Fallen Kingdom', 'I have seen the movie Jurassic World: Fallen Kingdom']}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [1], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [24, 125, 161], 'startIndex': [0, 95, 135], 'text': ['I really like the actors', 'I just really like the scenery', 'the dinosaurs were awesome']}], 'speaker': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0], 'text': ['What kinds of movies do you like?', 'I really like comedy movies.', 'Why do you like comedies?', "I love to laugh and comedy movies, that's their whole purpose. Make you laugh.", 'Alright, how about a movie you liked?', 'I liked Step Brothers.', 'Why did you like that movie?', 'Had some amazing one-liners that still get used today even though the movie was made awhile ago.', 'Well, is there a movie you did not like?', "I don't like RV.", 'Why not?', "And I just didn't It was just so slow and boring. I didn't like it.", 'Ok, then have you seen the movie Jurassic World: Fallen Kingdom', 'I have seen the movie Jurassic World: Fallen Kingdom.', 'What is it about these kinds of movies that you like or dislike?', 'I really like the actors. I feel like they were doing their best to make the movie better. And I just really like the scenery, and the the dinosaurs were awesome.']}} ``` ### Data Fields Each conversation has the following fields: * `conversationId`: A unique random ID for the conversation. The ID has no meaning. * `utterances`: An array of utterances by the workers. Each utterance has the following fields: * `index`: A 0-based index indicating the order of the utterances in the conversation. * `speaker`: Either USER or ASSISTANT, indicating which role generated this utterance. * `text`: The raw text as written by the ASSISTANT, or transcribed from the spoken recording of USER. * `segments`: An array of semantic annotations of spans in the text. Each semantic annotation segment has the following fields: * `startIndex`: The position of the start of the annotation in the utterance text. * `endIndex`: The position of the end of the annotation in the utterance text. * `text`: The raw text that has been annotated. * `annotations`: An array of annotation details for this segment. Each annotation has two fields: * `annotationType`: The class of annotation (see ontology below). * `entityType`: The class of the entity to which the text refers (see ontology below). **EXPLANATION OF ONTOLOGY** In the corpus, preferences and the entities that these preferences refer to are annotated with an annotation type as well as an entity type. Annotation types fall into four categories: * `ENTITY_NAME` (0): These mark the names of relevant entities mentioned. * `ENTITY_PREFERENCE` (1): These are defined as statements indicating that the dialog participant does or does not like the relevant entity in general, or that they do or do not like some aspect of the entity. This may also be thought of the participant having some sentiment about what is being discussed. * `ENTITY_DESCRIPTION` (2): Neutral descriptions that describe an entity but do not convey an explicit liking or disliking. * `ENTITY_OTHER` (3): Other relevant statements about an entity that convey relevant information of how the participant relates to the entity but do not provide a sentiment. Most often, these relate to whether a participant has seen a particular movie, or knows a lot about a given entity. Entity types are marked as belonging to one of four categories: * `MOVIE_GENRE_OR_CATEGORY` (0): For genres or general descriptions that capture a particular type or style of movie. * `MOVIE_OR_SERIES` (1): For the full or partial name of a movie or series of movies. * `PERSON` (2): For the full or partial name of an actual person. * `SOMETHING_ELSE ` (3): For other important proper nouns, such as the names of characters or locations. ### Data Splits There is a single split of the dataset named 'train' which contains the whole datset. | | Train | | ------------------- | ----- | | Input Conversations | 502 | ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [Creative Commons Attribution 4.0 License](https://creativecommons.org/licenses/by/4.0/) ### Citation Information ``` @inproceedings{radlinski-etal-2019-ccpe, title = {Coached Conversational Preference Elicitation: A Case Study in Understanding Movie Preferences}, author = {Filip Radlinski and Krisztian Balog and Bill Byrne and Karthik Krishnamoorthi}, booktitle = {Proceedings of the Annual Meeting of the Special Interest Group on Discourse and Dialogue ({SIGDIAL})}, year = 2019 } ``` ### Contributions Thanks to [@vineeths96](https://github.com/vineeths96) for adding this dataset.

# 带指导的会话偏好提取数据集卡片（Coached Conversational Preference Elicitation） ## 目录 - [数据集描述](#dataset-description) - [数据集概况](#dataset-summary) - [支持任务与基准排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献者](#contributions) ## 数据集描述 - **项目主页**：[带指导的会话偏好提取数据集主页](https://research.google/tools/datasets/coached-conversational-preference-elicitation/) - **代码仓库**：[带指导的会话偏好提取数据集代码仓库](https://github.com/google-research-datasets/ccpe) - **相关论文**：[ACL文集](https://www.aclweb.org/anthology/W19-5941/) ### 数据集概况本数据集包含502段英语对话，共计12000条标注话语（utterances），对话双方为用户与助手，以自然语言讨论电影偏好。数据集采用奥兹巫师（Wizard-of-Oz）实验范式收集，参与人员为两名付费众包工作者：一人扮演"助手"角色，另一人扮演"用户"角色。"助手"按照带指导的会话偏好提取（Coached Conversational Preference Elicitation, CCPE）方法，引导"用户"表达其电影偏好。助手所提出的问题旨在尽可能降低用户用于传递偏好的术语偏差，并以自然语言形式获取用户偏好。每段对话均标注了实体提及、针对实体的偏好表达、实体描述以及其他与实体相关的陈述。 ### 支持任务与基准排行榜 * `other-other-Conversational Recommendation`: 本数据集可用于训练会话推荐模型，该任务的核心即为带指导的会话偏好提取。 ### 语言本数据集文本语言为英语，对应的BCP-47代码为`en`。 ## 数据集结构 ### 数据实例典型数据点包含"助手"与"用户"之间的一系列话语，每条话语均按照数据字段中提及的类别完成标注。以下为来自带指导的会话偏好提取数据集的示例： {'conversationId': 'CCPE-6faee', 'utterances': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'segments': [{'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [0], 'entityType': [0]}, {'annotationType': [1], 'entityType': [0]}], 'endIndex': [20, 27], 'startIndex': [14, 0], 'text': ['comedy', "I really like comedy movies"]}, {'annotations': [{'annotationType': [0], 'entityType': [0]}], 'endIndex': [24], 'startIndex': [16], 'text': ['comedies']}, {'annotations': [{'annotationType': [1], 'entityType': [0]}], 'endIndex': [15], 'startIndex': [0], 'text': ["I love to laugh"]}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [0], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [21, 21], 'startIndex': [8, 0], 'text': ['Step Brothers', "I liked Step Brothers"]}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [1], 'entityType': [1]}], 'endIndex': [32], 'startIndex': [0], 'text': ["Had some amazing one-liners that"]}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [0], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [15, 15], 'startIndex': [13, 0], 'text': ['RV', "I don't like RV"]}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [1], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [48, 66], 'startIndex': [18, 50], 'text': ["It was just so slow and boring", "I didn't like it"]}, {'annotations': [{'annotationType': [0], 'entityType': [1]}], 'endIndex': [63], 'startIndex': [33], 'text': ['Jurassic World: Fallen Kingdom']}, {'annotations': [{'annotationType': [0], 'entityType': [1]}, {'annotationType': [3], 'entityType': [1]}], 'endIndex': [52, 52], 'startIndex': [22, 0], 'text': ['Jurassic World: Fallen Kingdom', "I have seen the movie Jurassic World: Fallen Kingdom"]}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [1], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [24, 125, 161], 'startIndex': [0, 95, 135], 'text': ["I really like the actors", "I just really like the scenery", "the dinosaurs were awesome"]}], 'speaker': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0], 'text': ["What kinds of movies do you like?", "I really like comedy movies.", "Why do you like comedies?", "I love to laugh and comedy movies, that's their whole purpose. Make you laugh.", "Alright, how about a movie you liked?", "I liked Step Brothers.", "Why did you like that movie?", "Had some amazing one-liners that still get used today even though the movie was made awhile ago.", "Well, is there a movie you did not like?", "I don't like RV.", "Why not?", "And I just didn't It was just so slow and boring. I didn't like it.", "Ok, then have you seen the movie Jurassic World: Fallen Kingdom", "I have seen the movie Jurassic World: Fallen Kingdom.", "What is it about these kinds of movies that you like or dislike?", "I really like the actors. I feel like they were doing their best to make the movie better. And I just really like the scenery, and the the dinosaurs were awesome."]}} ### 数据字段每条对话包含以下字段： * `conversationId`：对话的唯一随机标识符，无实际语义。 * `utterances`：工作者所发言语的数组。每条话语包含以下字段： * `index`：基于0的索引，用于标识话语在对话中的顺序。 * `speaker`：取值为USER或ASSISTANT，用于标识生成该话语的角色。 * `text`：助手所撰写的原始文本，或用户口语录音的转写文本。 * `segments`：文本片段的语义标注数组。每个语义标注片段包含以下字段： * `startIndex`：标注在话语文本中的起始位置。 * `endIndex`：标注在话语文本中的结束位置。 * `text`：已完成标注的原始文本。 * `annotations`：该片段的标注详情数组。每个标注包含两个字段： * `annotationType`：标注类别（详见下文本体说明）。 * `entityType`：文本所指代实体的类别（详见下文本体说明）。 **本体说明** 在本语料库中，偏好及其所指代的实体同时通过标注类型与实体类型进行标注。标注类型分为四类： * `ENTITY_NAME`（0）：标记所提及的相关实体名称。 * `ENTITY_PREFERENCE`（1）：指表明对话参与者总体上是否喜欢某相关实体，或是否喜欢该实体的某些方面的陈述，也可理解为参与者对讨论内容所持有的情感倾向。 * `ENTITY_DESCRIPTION`（2）：描述实体的中立陈述，不传递明确的好恶倾向。 * `ENTITY_OTHER`（3）：与实体相关的其他陈述，传递参与者与实体之间的关联信息，但不包含情感倾向。最常见的场景为参与者是否看过某部电影，或对某实体有较多了解。实体类型分为四类： * `MOVIE_GENRE_OR_CATEGORY`（0）：用于指代某类电影的类型或风格描述。 * `MOVIE_OR_SERIES`（1）：用于指代电影或系列电影的完整或部分名称。 * `PERSON`（2）：用于指代真实人物的完整或部分名称。 * `SOMETHING_ELSE`（3）：用于指代其他重要专有名词，例如角色或地点名称。 ### 数据划分本数据集仅有一个名为"train"的划分，包含全部数据集。 | | 训练集 | | ------------------- | ----- | | 输入对话数 | 502 | ## 数据集构建 ### 构建初衷 [需补充更多信息] ### 源数据 #### 初始数据收集与标准化 [需补充更多信息] #### 源语言生产者是谁？ [需补充更多信息] ### 标注信息 #### 标注流程 [需补充更多信息] #### 标注者是谁？ [需补充更多信息] ### 个人与敏感信息 [需补充更多信息] ## 数据使用注意事项 ### 数据集的社会影响 [需补充更多信息] ### 偏差讨论 [需补充更多信息] ### 其他已知局限 [需补充更多信息] ## 附加信息 ### 数据集维护者 [需补充更多信息] ### 许可信息 [知识共享署名4.0许可协议（Creative Commons Attribution 4.0 License）](https://creativecommons.org/licenses/by/4.0/) ### 引用信息 @inproceedings{radlinski-etal-2019-ccpe, title = {带指导的会话偏好提取：电影偏好理解的案例研究（Coached Conversational Preference Elicitation: A Case Study in Understanding Movie Preferences）}, author = {Filip Radlinski and Krisztian Balog and Bill Byrne and Karthik Krishnamoorthi}, booktitle = {对话与话语研究特别兴趣组年度会议论文集（Proceedings of the Annual Meeting of the Special Interest Group on Discourse and Dialogue ({SIGDIAL})）}, year = 2019 } ### 贡献者感谢 [@vineeths96](https://github.com/vineeths96) 贡献本数据集。

提供机构：

maas

创建时间：

2025-07-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集