CLEAN

Name: CLEAN
Creator: 浙江理工大学
Published: 2024-02-15 21:03:57
License: 暂无描述

arXiv2024-02-15 更新2024-07-29 收录

下载链接：

https://zhiyiluo.site/misc/clean_v1.0_sample.json

下载链接

链接失效反馈

官方服务：

资源简介：

CLEAN数据集由浙江理工大学创建，是一个全面的中文多跨度问答数据集，涵盖广泛的开放领域主题，包含9063个样本，其中约76%需要描述性答案。数据集内容丰富，来源于大规模中文在线知识问答分享平台，如百度知道，支持将阅读理解作为答案提取任务。创建过程中，通过随机爬取一百万个问题并筛选高质量答案，确保问题意图得到适当处理。CLEAN数据集主要应用于解决开放领域中复杂问题的多信息提取，旨在克服现有数据集在问题选择上的局限性。

The CLEAN dataset, developed by Zhejiang Sci-Tech University, is a comprehensive Chinese multi-span question answering dataset covering a wide range of open-domain topics, with a total of 9063 samples, of which approximately 76% require descriptive answers. The dataset has rich content sourced from large-scale Chinese online knowledge Q&A sharing platforms such as Baidu Zhidao, and supports treating reading comprehension as an answer extraction task. During its construction, one million questions were randomly crawled and high-quality answers were screened to ensure proper handling of question intentions. The CLEAN dataset is primarily used for multi-information extraction of complex questions in open-domain scenarios, aiming to address the limitations of existing datasets in question selection.

提供机构：

浙江理工大学

创建时间：

2024-02-15

搜集汇总

数据集介绍