Sentence Compression
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/google-research-datasets/sentence-compression
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为GSC,包含了专注于句子压缩任务的训练和测试句子,内容丰富多样,语法结构各异。经过筛选,数据集中的句子被限制在最多50个BPE分词单位以内,从而使得训练集中的句子平均长度为23.1个单词。该数据集的规模包括20万个训练句子和9,995个测试句子,其任务类型为句子生成(采样)。
This dataset, named GSC, contains rich and diverse training and test sentences focused on the sentence compression task, with varied grammatical structures. After filtering, all sentences in the dataset are restricted to a maximum of 50 BPE tokens, resulting in an average length of 23.1 words for the training set sentences. The dataset comprises 200,000 training sentences and 9,995 test sentences, and its target task is sentence generation (sampling).
提供机构:
Google Research



