专利声明部分语音标签数据集
收藏arXiv2016-05-06 更新2024-06-21 收录
下载链接:
http://www.ece.drexel.edu/walsh/aspitrg/Release1.0.zip
下载链接
链接失效反馈官方服务:
资源简介:
专利声明部分语音标签数据集由德雷塞尔大学电气与计算机工程系创建,旨在解决现有自然语言处理软件在解析专利声明时的不足。数据集包含174,246个标签,主要来源于通过亚马逊Mechanical Turk进行的四次数据收集活动,每次活动的规模和范围逐渐增加。创建过程涉及将专利声明分割成小段,并使用斯坦福解析器获取初始部分语音标签,然后通过人工验证和纠正这些标签。该数据集主要应用于改进专利声明的自动解析系统,提高专利主题分类的准确性。
The part-of-speech tagging dataset for patent claims was created by the Department of Electrical and Computer Engineering at Drexel University, aiming to address the shortcomings of existing natural language processing (NLP) software when parsing patent claims. The dataset contains 174,246 tags, which are primarily derived from four data collection campaigns conducted via Amazon Mechanical Turk, with the scale and scope of each campaign increasing gradually. The creation process involves segmenting patent claims into short segments, using the Stanford Parser to obtain initial part-of-speech tags, followed by manual verification and correction of these tags. This dataset is mainly used to improve automatic patent claim parsing systems and enhance the accuracy of patent topic classification.
提供机构:
德雷塞尔大学电气与计算机工程系
创建时间:
2016-05-06



