DipeshChaudhary/muril-nepali-gector-style-token-level-tag-for-ged
收藏Hugging Face2025-11-08 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/DipeshChaudhary/muril-nepali-gector-style-token-level-tag-for-ged
下载链接
链接失效反馈官方服务:
资源简介:
这是一个尼泊尔语的语法错误纠正数据集,设计用于训练GEC-ToR风格的序列标注模型。数据集经过一种健壮的多通道内容感知对齐算法处理,生成高保真的校正标签,包括复杂和相邻的交换操作。数据集包含正确的句子和错误的句子,以便模型稳定性。数据集按主导错误标签进行分层划分,以确保评估的平衡性。该数据集使用了一个增强的10标签系统,包括保持、删除、替换、追加、与前一个词交换、与后一个词交换、与后一个词合并、与前一个词合并、拆分和未知标签。
This is a Nepali grammatical error correction dataset, designed for training GEC-ToR-style sequence tagging models. The dataset has been processed with a robust, multi-pass, content-aware alignment algorithm to generate high-fidelity correction tags, including complex and adjacent SWAP operations. The dataset contains both correct and incorrect sentences for model stability and is stratified by the dominant error tag to ensure balanced evaluation. The dataset uses an enhanced 10-tag system, including KEEP, DELETE, REPLACE, APPEND, SWAP_NEXT, SWAP_PREV, MERGE_NEXT, MERGE_PREV, SPLIT, and UNKNOWN tags.
提供机构:
DipeshChaudhary



