five

DatarrX/Myanmar-Style-Classification-Corpus

收藏
Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/DatarrX/Myanmar-Style-Classification-Corpus
下载链接
链接失效反馈
官方服务:
资源简介:
缅甸风格分类语料库(MSCC)是一个专门用于二元文本分类的数据集,旨在帮助AI模型区分缅甸语的书面风格(正式)和口语风格(非正式)。该数据集是《缅甸书面-口语平行语料库(MWSPC)》的重构版本,重新格式化以支持监督学习任务,如风格检测和语言分析。数据集包含11,110行独特的缅甸语句子,每行都有一个文本字段和一个标签字段(0表示书面风格,1表示口语风格)。数据集经过严格过滤,确保100%的唯一性,包含5,555个独特的书面句子和5,555个独特的口语句子。该数据集可用于风格检测、语法和风格检查以及数据过滤等任务。

The Myanmar Style Classification Corpus (MSCC) is a specialized dataset for binary text classification, designed to help AI models distinguish between Written Style (Formal) and Spoken Style (Informal) Burmese text. This dataset is a reconstructed version of the Myanmar Written-Spoken Parallel Corpus (MWSPC), reformatted to support supervised learning tasks such as style detection and linguistic analysis. The dataset contains 11,110 rows of unique Burmese sentences, each with a text field and a label field (0 for Written Style, 1 for Spoken Style). Every entry in this corpus has been strictly filtered to ensure 100% uniqueness, with 5,555 unique written sentences and 5,555 unique spoken sentences. The dataset can be used for style detection, grammar and style checking, and data filtering tasks.
提供机构:
DatarrX
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作