SauDial: The Saudi Arabic Dialects Game Localization Dataset
收藏DataCite Commons2025-04-01 更新2025-04-16 收录
下载链接:
https://data.mendeley.com/datasets/mzdwkb2t6d
下载链接
链接失效反馈官方服务:
资源简介:
SauDial: The Saudi Arabic Dialects Game Localization Dataset is a curated collection of parallel text samples designed for video game localization. It features content in English, Modern Standard Arabic (MSA), and four major Saudi dialects: Najdi, Hijazi, Janoubi, and Eastern. The dataset was initially generated using the OpenAI GPT-4o model and refined with pre-compiled dialect-specific resources.
Each entry in the dataset includes:
- Original English text
- MSA translation
- Dialectal translation
- Game context and age rating information
- Linguistic notes on dialect features
The content covers various game genres, scenario types, tones, and age ratings, making it versatile for different game development needs. The dataset underwent thorough cleaning and editing to ensure dialectal accuracy, tonal appropriateness, and cultural fidelity.
This resource is valuable for:
- Game developers and localization teams
- Researchers in translation, cultural, localization, and game studies
- Training and fine-tuning Large Language Models (LLMs)
- Educational purposes in translation and localization studies
- Professional translators and localizers as a specialized translation memory (TM)
The dataset aims to streamline game localization processes and enhance the authenticity of Arabic language representation in video games, particularly for the Saudi market.
提供机构:
Mendeley Data
创建时间:
2024-09-16



