d0rj/conv_ai_3_ru
收藏Hugging Face2023-05-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/d0rj/conv_ai_3_ru
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- crowdsourced
language_creators:
- translated
language:
- ru
license:
- unknown
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- conv_ai_3
task_categories:
- conversational
- text-classification
task_ids:
- text-scoring
paperswithcode_id: null
pretty_name: conv_ai_3 (ru)
tags:
- evaluating-dialogue-systems
dataset_info:
features:
- name: topic_id
dtype: int32
- name: initial_request
dtype: string
- name: topic_desc
dtype: string
- name: clarification_need
dtype: int32
- name: facet_id
dtype: string
- name: facet_desc
dtype: string
- name: question_id
dtype: string
- name: question
dtype: string
- name: answer
dtype: string
config_name: conv_ai_3
splits:
- name: train
num_examples: 9176
- name: validation
num_examples: 2313
---
# Dataset Card for d0rj/conv_ai_3_ru
## Dataset Description
- **Homepage:** https://github.com/aliannejadi/ClariQ
- **Repository:** https://github.com/aliannejadi/ClariQ
- **Paper:** https://arxiv.org/abs/2009.11352
### Dataset Summary
This is translated version of [conv_ai_3](https://huggingface.co/datasets/conv_ai_3) dataset to Russian language.
### Languages
Russian (translated from English).
## Dataset Structure
### Data Fields
- `topic_id`: the ID of the topic (`initial_request`).
- `initial_request`: the query (text) that initiates the conversation.
- `topic_desc`: a full description of the topic as it appears in the TREC Web Track data.
- `clarification_need`: a label from 1 to 4, indicating how much it is needed to clarify a topic. If an `initial_request` is self-contained and would not need any clarification, the label would be 1. While if a `initial_request` is absolutely ambiguous, making it impossible for a search engine to guess the user's right intent before clarification, the label would be 4.
- `facet_id`: the ID of the facet.
- `facet_desc`: a full description of the facet (information need) as it appears in the TREC Web Track data.
- `question_id`: the ID of the question..
- `question`: a clarifying question that the system can pose to the user for the current topic and facet.
- `answer`: an answer to the clarifying question, assuming that the user is in the context of the current row (i.e., the user's initial query is `initial_request`, their information need is `facet_desc`, and `question` has been posed to the user).
### Citation Information
@misc{aliannejadi2020convai3,
title={ConvAI3: Generating Clarifying Questions for Open-Domain Dialogue Systems (ClariQ)},
author={Mohammad Aliannejadi and Julia Kiseleva and Aleksandr Chuklin and Jeff Dalton and Mikhail Burtsev},
year={2020},
eprint={2009.11352},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
### Contributions
Thanks to [@rkc007](https://github.com/rkc007) for adding this dataset.
提供机构:
d0rj
原始信息汇总
数据集概述
基本信息
- 数据集名称: conv_ai_3 (ru)
- 语言: 俄语
- 许可证: 未知
- 多语言性: 单语种
- 大小类别: 10K<n<100K
- 源数据集: conv_ai_3
- 任务类别: 对话式, 文本分类
- 任务ID: text-scoring
- 标签: 评估对话系统
数据集结构
数据字段
topic_id: 主题ID,数据类型为int32。initial_request: 初始请求,数据类型为string。topic_desc: 主题描述,数据类型为string。clarification_need: 澄清需求,数据类型为int32,范围从1到4。facet_id: 方面ID,数据类型为string。facet_desc: 方面描述,数据类型为string。question_id: 问题ID,数据类型为string。question: 澄清问题,数据类型为string。answer: 回答,数据类型为string。
数据分割
- 训练集: 包含9176个样本。
- 验证集: 包含2313个样本。



