Nested Named Entity Recognition Dataset of Chinese Imperial Civil Service Examination Documents
收藏DataCite Commons2025-08-01 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=0d33017a177a4b30bc5c127061eda039
下载链接
链接失效反馈官方服务:
资源简介:
The construction of the nested named entity recognition dataset for ancient imperial examinations adopts a systematic multi-stage process to ensure high accuracy and consistency in data annotation. Firstly, the research team has developed detailed annotation standards, providing unified guiding principles for subsequent operations. Subsequently, 4 master's students from the School of Literature with a solid foundation in linguistics were selected from 8 candidates to form a annotation team, and specialized training on ancient book knowledge and doccano annotation tools was provided to them. In the formal annotation stage, a two person independent annotation strategy is adopted, where two annotators independently annotate the same text. After the annotation is completed, the team conducts consistency evaluation on the annotation results, submits samples with differences to domain experts for review and ruling, and has experts conduct final review of all annotation results to further ensure data quality. Finally, a nested named entity recognition dataset for the imperial examination field was constructed, consisting of 2238 text instances, covering 11185 entities and 18 types of entities. Among them, there are 1868 nested entities and 1130 polysemous entities. The data sources include the Ming Dynasty's "Shilu" series (368 entries), "Huangming Gongju Kao" (328 entries), "Xianzheng Lu" (121 entries), "Xiangkao Lu" (96 entries), "Jinshi Deng Ke Kao" (381 entries), "Lei Xing Deng Ke Kao" (216 entries), "Jinshi Xu Ci Lu" (142 entries), and "Guo Dynasty Calendar Inscription Stele Collection" (189 entries).
提供机构:
Science Data Bank
创建时间:
2025-07-25



