DatarrX/Myanmar-Written-Spoken-Parallel-Corpus
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/DatarrX/Myanmar-Written-Spoken-Parallel-Corpus
下载链接
链接失效反馈官方服务:
资源简介:
缅甸书面-口语平行语料库(MWSPC)是一个高质量的开源数据集,旨在弥合正式书面缅甸语与日常口语缅甸语之间的差距。该数据集对于构建理解缅甸语语言细微差别的自然AI模型至关重要。数据集包含5,555行平行文本对,每行都经过严格筛选,确保100%的唯一性。数据集由Khant Sint Heinn(Kalix Louis)策划,DatarrX组织发布,采用Creative Commons Attribution 4.0 International(CC-BY-4.0)许可。数据集的主要用途包括风格转换、机器翻译和预处理。
Myanmar Written-Spoken Parallel Corpus (MWSPC) is a high-quality open-source dataset designed to bridge the gap between formal written Burmese and daily spoken Burmese. This dataset is crucial for building natural-sounding AI models that understand the linguistic nuances of the Myanmar language. The dataset consists of 5,555 rows of parallel text pairs, each strictly filtered to ensure 100% uniqueness. It is curated by Khant Sint Heinn (Kalix Louis) and published by DatarrX under the Creative Commons Attribution 4.0 International (CC-BY-4.0) license. Primary uses of the dataset include style transfer, machine translation, and preprocessing.
提供机构:
DatarrX



