Bhinneka Korpus
收藏arXiv2024-04-01 更新2024-06-21 收录
下载链接:
https://github.com/joanitolopo/bhinneka-korpus
下载链接
链接失效反馈官方服务:
资源简介:
Bhinneka Korpus是由加札马达大学构建的多语种平行语料库,专注于印尼本地语言,特别是资源较少的语言。该数据集包含18,000条句子,涵盖五种印尼本地语言,旨在增强这些语言在自然语言处理领域的可用性和应用。数据集的创建过程涉及志愿者参与和双盲评估,确保数据质量。Bhinneka Korpus的应用领域包括机器翻译和语言多样性保护,旨在解决印尼本地语言资源稀缺的问题,推动多语种翻译模型的发展。
Bhinneka Korpus is a multilingual parallel corpus constructed by Gadjah Mada University, focusing on Indonesian local languages, especially under-resourced ones. This dataset contains 18,000 sentence entries covering five local Indonesian languages, aiming to enhance the usability and practical applications of these languages in the field of natural language processing. The development of this dataset involves volunteer participation and double-blind evaluation to ensure data quality. The application scenarios of Bhinneka Korpus include machine translation and language diversity conservation, with the goals of addressing the scarcity of resources for local Indonesian languages and promoting the development of multilingual translation models.
提供机构:
加札马达大学
创建时间:
2024-04-01



