ArzEn
收藏arXiv2025-09-30 收录
下载链接:
http://github.com/ahmedheakl/arazn-llm
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为ArzEn,主要针对机器翻译和自动语音识别任务,包含了埃及阿拉伯语与英语之间的代码转换数据。该数据集用于评估包括LLaMa3和Whisper在内的多种模型的表现,特别关注处理埃及阿拉伯方言和代码转换现象的独特特性。其所涉及的任务包括机器翻译和自动语音识别。
Named ArzEn, this dataset is mainly intended for machine translation and automatic speech recognition tasks, and comprises code-switching data between Egyptian Arabic and English. It is employed to evaluate the performance of various models including LLaMa3 and Whisper, with special focus on the unique characteristics of handling Egyptian Arabic dialects and code-switching phenomena. The tasks involved in this dataset include machine translation and automatic speech recognition.



