OSN-MDAD
收藏arXiv2023-09-21 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2309.12137v1
下载链接
链接失效反馈官方服务:
资源简介:
OSN-MDAD数据集是由渥太华大学电气工程与计算机科学学院多媒体通信研究实验室创建的,旨在解决阿拉伯语多方言在社交媒体上的机器翻译问题。该数据集通过将英语推文翻译成四种阿拉伯方言(海湾、也门、伊拉克和黎凡特/沙米)来构建,特别关注社交媒体上的非正式语言和文化表达。数据集的创建过程遵循了严格的翻译指南,确保翻译内容的文化相关性和准确性。OSN-MDAD数据集的应用领域主要集中在提高阿拉伯语多方言的机器翻译质量,以更好地理解和分析社交媒体上的阿拉伯语内容。
The OSN-MDAD dataset was developed by the Multimedia Communications Research Laboratory, School of Electrical Engineering and Computer Science, University of Ottawa, with the goal of addressing machine translation challenges for multi-dialectal Arabic on social media. This dataset is constructed by translating English tweets into four Arabic dialects: Gulf, Yemeni, Iraqi, and Levantine/Shami, with a particular focus on informal language and cultural expressions on social media platforms. The dataset creation process follows strict translation guidelines to ensure the cultural relevance and accuracy of the translated content. The main application scenarios of the OSN-MDAD dataset focus on improving the machine translation quality for multi-dialectal Arabic, so as to better understand and analyze Arabic content on social media.
提供机构:
渥太华大学电气工程与计算机科学学院多媒体通信研究实验室
创建时间:
2023-09-21



