five

d0rj/conv_ai_3_ru

收藏
Hugging Face2023-05-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/d0rj/conv_ai_3_ru
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - crowdsourced language_creators: - translated language: - ru license: - unknown multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - conv_ai_3 task_categories: - conversational - text-classification task_ids: - text-scoring paperswithcode_id: null pretty_name: conv_ai_3 (ru) tags: - evaluating-dialogue-systems dataset_info: features: - name: topic_id dtype: int32 - name: initial_request dtype: string - name: topic_desc dtype: string - name: clarification_need dtype: int32 - name: facet_id dtype: string - name: facet_desc dtype: string - name: question_id dtype: string - name: question dtype: string - name: answer dtype: string config_name: conv_ai_3 splits: - name: train num_examples: 9176 - name: validation num_examples: 2313 --- # Dataset Card for d0rj/conv_ai_3_ru ## Dataset Description - **Homepage:** https://github.com/aliannejadi/ClariQ - **Repository:** https://github.com/aliannejadi/ClariQ - **Paper:** https://arxiv.org/abs/2009.11352 ### Dataset Summary This is translated version of [conv_ai_3](https://huggingface.co/datasets/conv_ai_3) dataset to Russian language. ### Languages Russian (translated from English). ## Dataset Structure ### Data Fields - `topic_id`: the ID of the topic (`initial_request`). - `initial_request`: the query (text) that initiates the conversation. - `topic_desc`: a full description of the topic as it appears in the TREC Web Track data. - `clarification_need`: a label from 1 to 4, indicating how much it is needed to clarify a topic. If an `initial_request` is self-contained and would not need any clarification, the label would be 1. While if a `initial_request` is absolutely ambiguous, making it impossible for a search engine to guess the user's right intent before clarification, the label would be 4. - `facet_id`: the ID of the facet. - `facet_desc`: a full description of the facet (information need) as it appears in the TREC Web Track data. - `question_id`: the ID of the question.. - `question`: a clarifying question that the system can pose to the user for the current topic and facet. - `answer`: an answer to the clarifying question, assuming that the user is in the context of the current row (i.e., the user's initial query is `initial_request`, their information need is `facet_desc`, and `question` has been posed to the user). ### Citation Information @misc{aliannejadi2020convai3, title={ConvAI3: Generating Clarifying Questions for Open-Domain Dialogue Systems (ClariQ)}, author={Mohammad Aliannejadi and Julia Kiseleva and Aleksandr Chuklin and Jeff Dalton and Mikhail Burtsev}, year={2020}, eprint={2009.11352}, archivePrefix={arXiv}, primaryClass={cs.CL} } ### Contributions Thanks to [@rkc007](https://github.com/rkc007) for adding this dataset.
提供机构:
d0rj
原始信息汇总

数据集概述

基本信息

  • 数据集名称: conv_ai_3 (ru)
  • 语言: 俄语
  • 许可证: 未知
  • 多语言性: 单语种
  • 大小类别: 10K<n<100K
  • 源数据集: conv_ai_3
  • 任务类别: 对话式, 文本分类
  • 任务ID: text-scoring
  • 标签: 评估对话系统

数据集结构

数据字段

  • topic_id: 主题ID,数据类型为int32。
  • initial_request: 初始请求,数据类型为string。
  • topic_desc: 主题描述,数据类型为string。
  • clarification_need: 澄清需求,数据类型为int32,范围从1到4。
  • facet_id: 方面ID,数据类型为string。
  • facet_desc: 方面描述,数据类型为string。
  • question_id: 问题ID,数据类型为string。
  • question: 澄清问题,数据类型为string。
  • answer: 回答,数据类型为string。

数据分割

  • 训练集: 包含9176个样本。
  • 验证集: 包含2313个样本。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作