CSED
收藏arXiv2023-05-09 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2305.05183v1
下载链接
链接失效反馈官方服务:
资源简介:
CSED数据集是哈尔滨工业大学社会计算与信息检索研究中心构建的中文语义错误诊断数据集,包含CSED-R和CSED-C两个子数据集,总计62060条数据。CSED-R用于判断句子是否包含语义错误,包含49408条数据;CSED-C用于将语义错误的句子修正为正确句子,包含12652条数据。数据集通过网络爬虫从中学考试资源中获取,并经过专业标注确保数据质量。该数据集主要应用于教育、新闻出版等领域,旨在解决中文语义错误识别与修正的问题。
CSED is a Chinese semantic error diagnosis dataset constructed by the Social Computing and Information Retrieval Research Center of Harbin Institute of Technology. It contains two sub-datasets, CSED-R and CSED-C, with a total of 62,060 data instances. CSED-R is utilized to determine whether a sentence contains semantic errors, comprising 49,408 samples; CSED-C is intended to correct semantically erroneous sentences into correct ones, with 12,652 samples. The dataset was collected from middle school examination resources via web crawling, and its data quality was guaranteed through professional annotation. This dataset is primarily applied in domains including education, journalism and publishing, aiming to address the issues of Chinese semantic error recognition and correction.
提供机构:
哈尔滨工业大学社会计算与信息检索研究中心
创建时间:
2023-05-09



