德国法律文件命名实体识别数据集
收藏arXiv2020-03-29 更新2024-06-21 收录
下载链接:
https://github.com/elenanereiss/Legal-Entity-Recognition
下载链接
链接失效反馈官方服务:
资源简介:
德国法律文件命名实体识别数据集是由德国人工智能研究中心开发,专为德国联邦法院判决中的命名实体识别任务设计。该数据集包含约67,000个句子,超过200万个tokens,并包含54,000个手动标注的实体,映射到19个细粒度的语义类别。数据集内容丰富,涵盖了法律领域的多种实体类型,如人、法官、律师、国家、城市等。创建过程涉及对法律文件的深入分析和精确标注,旨在为德国法律文件的NER服务训练提供支持,特别是在欧盟项目Lynx中。该数据集的应用领域主要集中在法律文本分析,帮助解决法律文档中实体识别的难题。
The German Legal Document Named Entity Recognition (NER) Dataset was developed by the German Research Center for Artificial Intelligence (DFKI), and is specifically tailored for the named entity recognition task in judgments issued by German federal courts. This dataset contains approximately 67,000 sentences, more than 2 million tokens, as well as 54,000 manually annotated entities mapped to 19 fine-grained semantic categories. It is comprehensive, covering diverse entity types in the legal domain, including individuals, judges, lawyers, countries, cities, and so forth. The development of this dataset entails in-depth analysis and precise annotation of legal documents, with the goal of supporting the training of NER systems for German legal documents, particularly within the scope of the EU project Lynx. The primary application scenarios of this dataset lie in legal text analysis, where it assists in addressing the challenges of entity recognition in legal documents.
提供机构:
德国人工智能研究中心
创建时间:
2020-03-29



