WikiGoldSK
收藏arXiv2023-04-08 更新2024-06-21 收录
下载链接:
https://github.com/NaiveNeuron/WikiGoldSK
下载链接
链接失效反馈官方服务:
资源简介:
WikiGoldSK是由布拉迪斯拉发康门纽斯大学创建的首个大规模人工标注的斯洛伐克语命名实体识别数据集。该数据集包含从斯洛伐克语维基百科中抽样的412篇文章,标注了四类实体:地点、人物、组织和其他。创建过程中,数据集通过SlovakBERT模型预标注,并由三位斯洛伐克语母语者进行校对和修正,确保了高质量的标注。WikiGoldSK旨在解决斯洛伐克语在命名实体识别领域的数据稀缺问题,支持斯洛伐克语NLP研究和应用的发展。
WikiGoldSK is the first large-scale manually annotated Slovak named entity recognition (NER) dataset created by Comenius University in Bratislava. It contains 412 articles sampled from Slovak Wikipedia, annotated with four entity categories: locations, persons, organizations, and others. During its construction, the dataset was pre-annotated using the SlovakBERT model, and then proofread and corrected by three native Slovak speakers to ensure high-quality annotations. WikiGoldSK aims to address the data scarcity issue in the Slovak NER domain, supporting the development of Slovak NLP research and applications.
提供机构:
布拉迪斯拉发康门纽斯大学
创建时间:
2023-04-08



