EdinburghNLP/nunavut-hansard-plusplus
收藏Hugging Face2025-09-06 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/EdinburghNLP/nunavut-hansard-plusplus
下载链接
链接失效反馈官方服务:
资源简介:
nunavut hansard inuktitut-英语平行语料库3.0++是一个经过处理的语料库,包含英语和因纽特语(包括音节文字版本)的平行文本。这个数据集已经去重并删除了空行,同时为每个语料库添加了词形和词性信息。英语的词形和词性信息是通过SpaCy的en_core_web_trf模型添加的,而因纽特语的这些信息是通过自动的神经和基于规则的词态分析得到的,并且还包括了可能的词形翻译。
The Nunavut Hansard Inuktitut–English Parallel Corpus 3.0++ is a processed corpus containing parallel texts in English and Inuktitut (including the syllabics version). This dataset has been deduped and blank lines removed, and morphological and part-of-speech information has been added for both corpora. For English, this information was added using SpaCys en_core_web_trf model, while for Inuktitut, it comes from automatic neural and rule-based morphological analyses, and includes a translation of the lemma when possible.
提供机构:
EdinburghNLP



