WebQA v1.0 百度中文问答数据集

超神经2023-12-26 更新2024-05-15 收录

下载链接：

https://hyper.ai/cn/datasets/28467

下载链接

链接失效反馈

官方服务：

资源简介：

这是百度于 2016 年开源的数据集，数据来自于百度知道，格式为一个问题多篇意思基本一致的文章。数据整体质量中，因为混合了很多检索而来的文章，文章分为人工标注 (ANN) 和浏览器检索 (IR)；问题和文章的答案分为可回答 (positive) 和不可回答 (other_negative) 。

This is a dataset open-sourced by Baidu in 2016, sourced from Baidu Zhidao. The dataset follows the structure of one question paired with multiple articles that share roughly consistent meanings. The overall quality of the dataset is moderate, as it incorporates a large number of retrieved articles. The articles are categorized into two types: manually annotated (ANN) and browser-retrieved (IR). The question-article answer pairs are divided into two categories: answerable (positive) and unanswerable (other_negative).

创建时间：

2023-12-26

搜集汇总

数据集介绍

背景与挑战

背景概述

WebQA v1.0 是百度于2016年开源的中文问答数据集，数据来源于百度知道，采用一个问题对应多篇语义相似文章的形式。该数据集质量中等，包含人工标注和浏览器检索的文章，并区分可回答与不可回答的答案，适用于自然语言处理任务如问答系统训练。

以上内容由遇见数据集搜集并总结生成