Hanja Understanding Evaluation (HUE) dataset
收藏arXiv2022-10-11 更新2024-06-21 收录
下载链接:
https://github.com/haneul-yoo/HUE.git
下载链接
链接失效反馈官方服务:
资源简介:
HUE数据集旨在帮助历史学者理解古代韩国的汉字文献。该数据集包含时间归属、主题分类、命名实体识别和摘要检索等任务,用于构建和评估汉字语言模型。数据集基于14至19世纪的两个主要文献库:《朝鲜王朝实录》和《皇家秘书院日记》。HUE数据集的应用领域包括加速专家翻译过程和帮助公众理解这些文献的基本概念。
The HUE dataset is developed to assist historians in understanding classical Korean Hanja documents. It incorporates tasks including temporal attribution, topic classification, named entity recognition, and abstract retrieval, which are designed for the construction and evaluation of Hanja language models. The dataset is based on two major archival collections spanning the 14th to 19th centuries: the Annals of the Joseon Dynasty and the Diary of the Royal Secretariat. Application scenarios of the HUE dataset include accelerating the expert translation workflow and helping the general public grasp the basic concepts of these historical documents.
提供机构:
韩国科学技术院
创建时间:
2022-10-11



