POSH, the POS-tagged Headline corpus
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/UniversalDependencies/UD_English-EWT/tree/r2.6
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个全新的语料库,包含了5248条以上经过标注的英文新闻标题,这些标题都附有词性标签,旨在提升针对新闻标题的词性标注模型的性能。此外,这个数据集是由六位熟练的英语标注员使用统一词性标签集(UD)进行标注的,选择该标签集是因为它具有更粗的粒度。该数据集的规模为5248条标题,所涉及的任务是词性标注。
This dataset is a novel corpus comprising over 5,248 annotated English news headlines, each paired with corresponding part-of-speech (POS) tags, designed to enhance the performance of POS tagging models specifically optimized for news headlines. Moreover, this dataset was annotated by six skilled English annotators using a unified part-of-speech tag set (UD, Universal Dependencies), which was chosen for its relatively coarser granularity. The dataset contains a total of 5,248 headlines, with the target task being part-of-speech tagging.
提供机构:
Google sentence compression corpus



