five

CSS

收藏
arXiv2023-06-07 更新2024-06-21 收录
下载链接:
https://github.com/maybenotime/CSS
下载链接
链接失效反馈
官方服务:
资源简介:
CSS(中文句子简化数据集)是由北京大学相关研究机构创建的,旨在评估中文句子简化模型的数据集。该数据集包含766条人工简化的句子,每个原始句子有两个简化版本。数据来源于人民日报的PFR语料库,通过随机抽样和人工简化过程构建。CSS数据集的特点是包含了多种简化操作的标签,如词汇简化、句子分割、压缩和句子改写等。该数据集主要用于评估和改进中文句子简化技术,特别是针对非母语读者和阅读障碍者的需求。

The CSS (Chinese Sentence Simplification) dataset was developed by research institutions affiliated with Peking University, with the goal of evaluating Chinese sentence simplification models. This dataset contains 766 manually simplified sentences, with two simplified versions for each original sentence. Derived from the PFR Corpus of People's Daily, the dataset was constructed through random sampling and manual simplification processes. A key characteristic of the CSS dataset is that it includes annotation labels for various simplification operations, such as lexical simplification, sentence splitting, compression, and sentence rewriting. This dataset is mainly utilized to evaluate and advance Chinese sentence simplification technologies, particularly catering to the requirements of non-native readers and people with reading disorders.
提供机构:
北京大学计算机科学技术研究所,北京大学数据科学研究中心,教育部计算语言学重点实验室
创建时间:
2023-06-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作