FGraDA

Name: FGraDA
Creator: 南京大学国家软件新技术重点实验室
Published: 2021-11-07 13:23:47
License: 暂无描述

arXiv2021-11-07 更新2024-06-21 收录

下载链接：

https://github.com/OwenNJU/FGraDA

下载链接

链接失效反馈

官方服务：

资源简介：

FGraDA数据集由南京大学国家软件新技术重点实验室创建，专注于机器翻译中的细粒度领域适应问题。该数据集包含四个信息技术子领域的汉英翻译任务：自动驾驶、AI教育、实时网络和智能手机。每个子领域配备有开发集和测试集，用于评估翻译质量。FGraDA不使用领域内双语训练数据，而是提供双语词典和维基知识库，以模拟真实世界中快速和经济的数据获取需求。数据集的应用领域包括为特定国际会议提供翻译服务，解决特定领域内数据资源有限的问题。

The FGraDA dataset was created by the State Key Laboratory for Novel Software Technology at Nanjing University, focusing on fine-grained domain adaptation in machine translation. This dataset covers Chinese-English translation tasks across four information technology subfields: autonomous driving, AI education, real-time networking, and smartphones. Each subfield is equipped with a development set and a test set for evaluating translation quality. FGraDA does not use in-domain bilingual training data, but instead provides bilingual lexicons and Wikipedia knowledge bases to simulate fast and cost-effective data acquisition demands in real-world scenarios. The application fields of this dataset include providing translation services for specific international conferences and solving the problem of limited data resources in specific domains.

提供机构：

南京大学国家软件新技术重点实验室

创建时间：

2021-01-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集