five

Romanian Named Entity Recognition in the Legal domain (LegalNERo)

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/4772094
下载链接
链接失效反馈
官方服务:
资源简介:
LegalNERo is a manually annotated corpus for named entity recognition in the Romanian legal domain.  It provides gold annotations for organizations, locations, persons, time and legal resources mentioned in legal documents. Additionally it offers GEONAMES codes for the named entities annotated as location (where a link could be established).  The LegalNERo corpus is available in different formats: span-based, token-based and RDF.  The Linguistic Linked Open Data (LLOD) version is provided in RDF-Turtle format. CONLLUP files conform to the CoNLL-U Plus format https://universaldependencies.org/ext-format.html . Part-of-speech tagging was realized using UDPIPE.  Named entity annotations are placed in the column "RELATE:NE" (the 11th column) as defined in the "global.columns" metadata field. Similarly GEONAMES references are in the column "RELATE:GEONAMES" (the 12th column, last). Automatic processing was performed through the RELATE platform (https://relate.racai.ro). ANN files conform to BRAT format (https://brat.nlplab.org/).   The archive contains:  - ann_LEGAL_PER_LOC_ORG_TIME_overlap      Folder in which all the files are in .ann format and contains annotations of: legal resources mentioned, persons, locations, organizations and time.      Overlapping annotations of organizations and time entities inside legal references were allowed.  - ann_LEGAL_PER_LOC_ORG_TIME      Folder in which all the files are in .ann format and contains annotations of: legal resources mentioned, persons, locations, organizations and time.      Overlapping annotations were not allowed and only the longest named entities were annotated.  - ann_PER_LOC_ORG_TIME      Folder in which all the files are in .ann format and contains annotations of: persons, locations, organizations and time.      There are no overlapping annotations.  - conllup_LEGAL_PER_LOC_ORG_TIME      Folder in which all the files are in .conllup format and contains annotations of: legal resources mentioned, persons, locations, organizations and time.      Overlapping annotations were not allowed and only the longest named entities were annotated.      The annotation of these files was enhanced with GEONAMES codes (where linking was possible).   - conllup_PER_LOC_ORG_TIME      Folder in which all the files are in .conllup format and contains annotations of: persons, locations, organizations and time.      Overlapping annotations were not allowed and only the longest named entities were annotated.      The annotation of these files was enhanced with GEONAMES codes (where linking was possible). - rdf      Folder containing the corpus in RDF-Turtle format.     All the annotations are available here in both span and token format. - text      Folder containing the raw texts.   NER System A NER model generated using the LegalNERo corpus can be used online in the RELATE platform: https://relate.racai.ro/index.php?path=ner/demo This system was described in: Păiș, Vasile and Mitrofan, Maria and Gasan, Carol Luca and Coneschi, Vlad and Ianov, Alexandru. Named Entity Recognition in the Romanian Legal Domain. In Proceedings of the Natural Legal Language Processing Workshop 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 9--18, nov 2021 LICENSING This work is provided under the license CC BY-NC-ND 4.0 (Attribution-NonCommercial-NoDerivatives 4.0 International). The license can be viewed online here: https://creativecommons.org/licenses/by-nc-nd/4.0/  and the full text here: https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode .  CONTACT Research Institute for Artificial Intelligence "Mihai Draganescu", Romanian Academy Web: http://www.racai.ro  Contact emails: vasile@racai.ro , maria@racai.ro
创建时间:
2022-08-26
二维码
社区交流群
二维码
科研交流群
商业服务