FAME-MT
收藏arXiv2024-05-20 更新2024-06-21 收录
下载链接:
https://github.com/laniqo-public/ fame-mt/
下载链接
链接失效反馈官方服务:
资源简介:
FAME-MT数据集由波兰兰尼科创建,包含1120万条翻译数据,涵盖112种欧洲语言对,旨在通过机器翻译确保目标语言的正式或非正式表达。数据集通过自动分类和人工标注相结合的方式创建,适用于调整机器翻译模型以适应不同语言对的正式表达需求。该数据集是目前最大的正式表达标注数据集,有助于提升机器翻译在多语言环境下的应用效果。
FAME-MT dataset was developed by Lanico, a Polish company. It contains 11.2 million translation entries, covering 112 European language pairs. The dataset is designed to ensure that target language outputs from machine translation employ formal or informal expressions. It was constructed through a combination of automatic classification and manual annotation, and is suitable for fine-tuning machine translation models to adapt to the formal expression requirements of various language pairs. As the largest annotated dataset focusing on formal expressions to date, this dataset helps improve the application effectiveness of machine translation in multilingual environments.
提供机构:
兰尼科
创建时间:
2024-05-20



