HeshamHaroon/ArzEn-MultiGenre
收藏Hugging Face2023-12-31 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/HeshamHaroon/ArzEn-MultiGenre
下载链接
链接失效反馈官方服务:
资源简介:
ArzEn-MultiGenre是一个独特的平行数据集,涵盖了埃及阿拉伯语的多样化内容,包括歌曲歌词、小说和电视剧字幕,这些内容都经过精心翻译并与英语对应文本对齐。该数据集是各种语言学和计算应用的宝贵工具。
ArzEn-MultiGenre是一个独特的平行数据集,涵盖了埃及阿拉伯语的多样化内容,包括歌曲歌词、小说和电视剧字幕,这些内容都经过精心翻译并与英语对应文本对齐。该数据集是各种语言学和计算应用的宝贵工具。
提供机构:
HeshamHaroon
原始信息汇总
ArzEn-MultiGenre: A Comprehensive Parallel Dataset
Overview
ArzEn-MultiGenre is a distinctive parallel dataset that encompasses a diverse collection of Egyptian Arabic content, including song lyrics, novels, and TV show subtitles, all translated and aligned with their English counterparts.
Dataset Details
- Total Segment Pairs: 25,557
- Languages: Egyptian Arabic and English
- Content Types: Song Lyrics, Novels, TV Show Subtitles
Applications
- Machine Translation Benchmarking
- Language Model Fine-Tuning
- Commercial Application Adaptation
Research Relevance
Significant for research in translation studies, cross-linguistic analysis, and lexical semantics.
Unique Contributions
- Diverse Textual Genres
- Gold-Standard Quality
Citation
Al-Sabbagh, Rania (2023). “ArzEn-MultiGenre: An aligned parallel dataset of Egyptian Arabic song lyrics, novels, and subtitles, with English translations.” Mendeley Data, V3, DOI: 10.17632/6k97jty9xg.3



