five

DatarrX/Myanmar-Written-Spoken-Parallel-Corpus

收藏
Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/DatarrX/Myanmar-Written-Spoken-Parallel-Corpus
下载链接
链接失效反馈
官方服务:
资源简介:
缅甸书面-口语平行语料库(MWSPC)是一个高质量的开源数据集,旨在弥合正式书面缅甸语与日常口语缅甸语之间的差距。该数据集对于构建理解缅甸语语言细微差别的自然AI模型至关重要。数据集包含5,555行平行文本对,每行都经过严格筛选,确保100%的唯一性。数据集由Khant Sint Heinn(Kalix Louis)策划,DatarrX组织发布,采用Creative Commons Attribution 4.0 International(CC-BY-4.0)许可。数据集的主要用途包括风格转换、机器翻译和预处理。

Myanmar Written-Spoken Parallel Corpus (MWSPC) is a high-quality open-source dataset designed to bridge the gap between formal written Burmese and daily spoken Burmese. This dataset is crucial for building natural-sounding AI models that understand the linguistic nuances of the Myanmar language. The dataset consists of 5,555 rows of parallel text pairs, each strictly filtered to ensure 100% uniqueness. It is curated by Khant Sint Heinn (Kalix Louis) and published by DatarrX under the Creative Commons Attribution 4.0 International (CC-BY-4.0) license. Primary uses of the dataset include style transfer, machine translation, and preprocessing.
提供机构:
DatarrX
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作