The Nunavut Hansard Inuktitut–English Parallel Corpus 3.0
收藏DataCite Commons2025-11-21 更新2024-07-13 收录
下载链接:
https://nrc-digital-repository.canada.ca/eng/view/object/?id=c7e34fa7-7629-43c2-bd6d-19b32bf64f60
下载链接
链接失效反馈官方服务:
资源简介:
The Inuktitut language, a member of the Inuit-Yupik-Unangan language family, is spoken across Arctic Canada and noted for its morphological complexity. It is an official language of two territories, Nunavut and the Northwest Territories, and has recognition in additional regions. This dataset is a newly released sentence-aligned Inuktitut–English corpus based on the proceedings of the Legislative Assembly of Nunavut, covering sessions from April 1999 to June 2017. With approximately 1.3 million aligned sentence pairs, this is, to our knowledge, the largest parallel corpus of a polysynthetic language, or an Indigenous language of the Americas, released to date. Accompanying the corpus is a subset of gold standard alignments for alignment evaluation purposes, and scripts to replicate the preprocessing used in our baseline machine translation experiments.
提供机构:
National Research Council Canada
创建时间:
2020-01-22



