five

nhuvo/MedEV

收藏
Hugging Face2024-03-29 更新2024-04-19 收录
下载链接:
https://hf-mirror.com/datasets/nhuvo/MedEV
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - translation language: - en - vi --- # MedEV Dataset ## Introduction The MedEV dataset marks a notable advancement in machine translation, focusing on the Vietnamese-English pair in the medical field. Its purpose is to address the lack of high-quality Vietnamese-English parallel data by offering around 360K sentence pairs. This dataset is designed to support the advancement of machine translation in medical domain, serving as a valuable tool to improve the precision and trustworthiness of medical translations between Vietnamese and English. ## Dataset Overview - **Domain:** Medical - **Language Pair:** Vietnamese-English - **Size:** ~360,000 sentence pairs - **Objective:** To support the development of machine translation models specifically tuned for the medical domain. - **Traning set:** 340,897 sentence pairs - **Valiation set:** 8,982 sentence pairs - **Test set:** 9,006 sentence pairs ## Accessing the Dataset The MedEV dataset is available for research purposes. [Please refer to our paper](https://arxiv.org/abs/2403.19161) for detailed information on the dataset construction, experimental setup, and analysis of results. ## Ethical Statement Data are collected from publicly available websites, such as journals and universities, but also from www.msd.com. The content extracted from these sources cannot be used for public or commercial purposes. Therefore, the content also contains no private data about the patients. ## Citing MedEV If you find the MedEV dataset useful in your research, please consider citing our paper: ``` @inproceedings{medev, title = {{Improving Vietnamese-English Medical Machine Translation}}, author = {Nhu Vo and Dat Quoc Nguyen and Dung D. Le and Massimo Piccardi and Wray Buntine}, booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)}, year = {2024} } ```
提供机构:
nhuvo
原始信息汇总

MedEV Dataset 概述

数据集基本信息

  • 任务类别: 翻译
  • 语言: 英语(en)、越南语(vi)

数据集详细信息

  • 领域: 医疗
  • 语言对: 越南语-英语
  • 大小: 约360,000句对
  • 目标: 支持医疗领域机器翻译模型的发展
  • 训练集: 340,897句对
  • 验证集: 8,982句对
  • 测试集: 9,006句对

数据集用途

该数据集旨在通过提供高质量的越南语-英语平行数据,支持医疗领域机器翻译的精确性和可信度提升。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作