proxectonos/UD_Galician-PUD
收藏Hugging Face2025-12-17 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/proxectonos/UD_Galician-PUD
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个加利西亚语的树库,包含1000个手动审核的句子,这些句子与其他可用的PUD树库对齐。句子来源于新闻期刊(句子ID以n开头)和维基百科(句子ID以w开头),每个文档不超过几个句子,且是随机选择的。句子ID中的前两位数字表示其原始语言:前750个句子原始语言为英语(01),其余250个句子原始语言为德语(02)、法语(03)、意大利语(04)或西班牙语(05),这些句子通过英语翻译成其他语言。整个数据集被标记为测试集,采用两步标注方法:首先使用最先进的加利西亚语NLP技术进行标注,然后由两位专家进行审核,标注者间一致性较高。
Dataset containing 1000 manually reviewed sentences aligned with the other available PUD treebanks. The sentences are taken from news journals (sentence id starts with n) and from Wikipedia (sentence id starts with w), with no more than a few sentences per document, which are randomly selected. The next two digits in the sentence id encode its original language: the first 750 sentences are originally English (01) and the remaining 250 sentences are originally German (02), French (03), Italian (04) or Spanish (05) and they were translated to other languages via English. Its labeled as a test set in its entirety and was annotated with a two step approach: it was first annotated using state-of-the-art Galician NLP and then reviewed by two experts, with a high inter-annotator agreement.
提供机构:
proxectonos



