DatarrX/Myanmar-Style-Classification-Corpus

Name: DatarrX/Myanmar-Style-Classification-Corpus
Creator: DatarrX
Published: 2026-04-24 11:00:09
License: 暂无描述

Hugging Face2026-04-24 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/DatarrX/Myanmar-Style-Classification-Corpus

下载链接

链接失效反馈

官方服务：

资源简介：

缅甸风格分类语料库（MSCC）是一个专门用于二元文本分类的数据集，旨在帮助AI模型区分缅甸语的书面风格（正式）和口语风格（非正式）。该数据集是《缅甸书面-口语平行语料库（MWSPC）》的重构版本，重新格式化以支持监督学习任务，如风格检测和语言分析。数据集包含11,110行独特的缅甸语句子，每行都有一个文本字段和一个标签字段（0表示书面风格，1表示口语风格）。数据集经过严格过滤，确保100%的唯一性，包含5,555个独特的书面句子和5,555个独特的口语句子。该数据集可用于风格检测、语法和风格检查以及数据过滤等任务。

The Myanmar Style Classification Corpus (MSCC) is a specialized dataset for binary text classification, designed to help AI models distinguish between Written Style (Formal) and Spoken Style (Informal) Burmese text. This dataset is a reconstructed version of the Myanmar Written-Spoken Parallel Corpus (MWSPC), reformatted to support supervised learning tasks such as style detection and linguistic analysis. The dataset contains 11,110 rows of unique Burmese sentences, each with a text field and a label field (0 for Written Style, 1 for Spoken Style). Every entry in this corpus has been strictly filtered to ensure 100% uniqueness, with 5,555 unique written sentences and 5,555 unique spoken sentences. The dataset can be used for style detection, grammar and style checking, and data filtering tasks.

提供机构：

DatarrX

5,000+

优质数据集

54 个

任务类型

进入经典数据集