SLPG/Biomedical_EN_FR_Corpus
收藏Hugging Face2024-11-09 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/SLPG/Biomedical_EN_FR_Corpus
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含630万句子的生物医学领域英法平行语料库,数据从维基百科抓取而来,通过三种相似度阈值提取平行句子,并经过二次领域内过滤以确保句子与生物医学领域的相关性。语料库包括Threshold-90、Threshold-85和Threshold-80三个数据集,每个数据集又根据不同的阈值进一步细分。该语料库旨在促进生物医学领域机器翻译的研究与开发。
This is a biomedical domain English-French parallel corpus containing 6.3 million sentences, scraped from Wikipedia and extracted using three similarity thresholds for parallel sentences, followed by a second in-domain filter to ensure relevance to the biomedical field. The corpus includes three datasets (Threshold-90, Threshold-85, and Threshold-80), each further divided based on different thresholds. It is intended to facilitate research and development in the field of biomedical machine translation.
提供机构:
SLPG



