five

FAME-MT

收藏
arXiv2024-05-20 更新2024-06-21 收录
下载链接:
https://github.com/laniqo-public/ fame-mt/
下载链接
链接失效反馈
官方服务:
资源简介:
FAME-MT数据集由波兰兰尼科创建,包含1120万条翻译数据,涵盖112种欧洲语言对,旨在通过机器翻译确保目标语言的正式或非正式表达。数据集通过自动分类和人工标注相结合的方式创建,适用于调整机器翻译模型以适应不同语言对的正式表达需求。该数据集是目前最大的正式表达标注数据集,有助于提升机器翻译在多语言环境下的应用效果。

FAME-MT dataset was developed by Lanico, a Polish company. It contains 11.2 million translation entries, covering 112 European language pairs. The dataset is designed to ensure that target language outputs from machine translation employ formal or informal expressions. It was constructed through a combination of automatic classification and manual annotation, and is suitable for fine-tuning machine translation models to adapt to the formal expression requirements of various language pairs. As the largest annotated dataset focusing on formal expressions to date, this dataset helps improve the application effectiveness of machine translation in multilingual environments.
提供机构:
兰尼科
创建时间:
2024-05-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作