JRC-Acquis
收藏arXiv2006-09-12 更新2024-08-01 收录
下载链接:
http://wt.jrc.it/lt/acquis/
下载链接
链接失效反馈官方服务:
资源简介:
JRC-Acquis是一个包含20多种语言的自由可用平行语料库,主要包含欧盟法律性质的文档。它涵盖了所有官方欧盟语言以及欧盟候选国家的语言,每种语言包含近8000份文档,平均每种语言近900万字。该语料库还提供了190多种语言对的段落对齐信息,并手动分类了EUROVOC主题领域,适用于多标签分类算法和关键词分配软件的训练与测试。
JRC-Acquis is a freely available parallel corpus primarily composed of EU legal documents, spanning more than 20 languages. It covers all official EU languages and the languages of EU candidate countries, with nearly 8,000 documents and an average of approximately 9 million words per language. Additionally, this corpus provides paragraph alignment information for over 190 language pairs, and has manually categorized EUROVOC subject domains, making it suitable for training and testing multi-label classification algorithms and keyword assignment software.
创建时间:
2006-09-12



