MILPaC (Multilingual Indian Legal Parallel Corpus)

Name: MILPaC (Multilingual Indian Legal Parallel Corpus)
Creator: 印度理工学院卡拉格普尔分校
Published: 2023-10-15 15:49:56
License: 暂无描述

arXiv2023-10-15 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2310.09765v1

下载链接

链接失效反馈

官方服务：

资源简介：

MILPaC是由印度理工学院卡拉格普尔分校创建的多语言印度法律平行语料库，包含17,853个平行文本对，涵盖英语和九种印度语言。该数据集精心编译自印度法律信息的可靠来源，包括印度-雅利安语系和达罗毗荼语系的语言。MILPaC旨在评估机器翻译系统在将英语法律文本翻译成各种印度语言或印度语言之间的表现，也可用于跨语言问答等其他NLP任务。数据集的应用领域是解决印度司法系统中语言障碍问题，提高印度人口对法律文本的可访问性。

MILPaC is a multilingual Indian legal parallel corpus developed by the Indian Institute of Technology Kharagpur. It comprises 17,853 parallel text pairs covering English and nine Indian languages, carefully compiled from reliable sources of Indian legal information and spanning languages from both the Indo-Aryan and Dravidian language families. MILPaC is designed to evaluate the performance of machine translation systems when translating English legal texts into various Indian languages or between different Indian languages, and can also be applied to other NLP tasks such as cross-lingual question answering. The dataset aims to address language barriers within India's judicial system and improve legal text accessibility for the Indian population.

提供机构：

印度理工学院卡拉格普尔分校

创建时间：

2023-10-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集