BVS Corpus
收藏arXiv2019-05-06 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/1905.01712v1
下载链接
链接失效反馈官方服务:
资源简介:
BVS Corpus是一个多语言平行语料库,专注于生物医学科学文本,由联邦里约热内卢大学和巴塞罗那超级计算中心创建。该数据集包含超过170万条记录,涵盖英语、葡萄牙语和西班牙语三种语言。数据集的创建过程涉及自动句子对齐和人工评估,确保高质量的语料对齐。BVS Corpus主要用于神经机器翻译(NMT)系统的训练,旨在提高生物医学领域的文本翻译质量。
The BVS Corpus is a multilingual parallel corpus focused on biomedical scientific texts, developed by the Federal University of Rio de Janeiro and the Barcelona Supercomputing Center. This dataset contains over 1.7 million records, covering three languages: English, Portuguese, and Spanish. The construction of this dataset involves automatic sentence alignment and manual evaluation to ensure high-quality corpus alignment. The BVS Corpus is primarily used for training neural machine translation (NMT) systems, with the goal of improving the quality of text translation in the biomedical field.
提供机构:
联邦里约热内卢大学巴塞罗那超级计算中心
创建时间:
2019-05-06



