hksamm/Burmese-English-Code-Mixed-Corpus
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/hksamm/Burmese-English-Code-Mixed-Corpus
下载链接
链接失效反馈官方服务:
资源简介:
“缅甸语-英语代码混合语料库”是一个高质量、人工策划的数据集,包含1,111个独特的句子,反映了缅甸数字社会中当代语言模式。与正式数据集不同,该集合捕捉了“自然流动”的对话,其中英语术语无缝集成到缅甸语句子结构中。每个句子都经过手动策划和验证,确保:100%正确拼写(严格遵守缅甸Unicode标准)、语法完整性(风格随意/对话式,但基础缅甸语语法正确保留)和真实社交语境(反映社交媒体、技术讨论和日常生活中的真实使用情况)。该数据集专为自然语言处理和机器学习研究设计,适用于大型语言模型微调、情感分析、机器翻译、语音识别和词性标注等多种AI/NLP领域。
The Burmese-English Code-Mixed Corpus is a high-quality, human-curated dataset containing 1,111 unique sentences that reflect the contemporary linguistic patterns of Myanmars digital society. Unlike formal datasets, this collection captures the natural flow of conversation where English terms are seamlessly integrated into Burmese sentence structures. Each sentence has been manually curated and verified to ensure: 100% correct spelling (strict adherence to Myanmar Unicode standards), grammatical integrity (while the style is casual/conversational, the underlying Burmese grammar is preserved correctly), and authentic social context (reflecting real-world usage in social media, tech discussions, and daily life). The dataset is specifically designed for Natural Language Processing and Machine Learning research and can be utilized in various AI/NLP domains such as large language model fine-tuning, sentiment analysis, machine translation, speech recognition, and part-of-speech tagging.
提供机构:
hksamm



