Diorrock/python-typos-109k
收藏Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Diorrock/python-typos-109k
下载链接
链接失效反馈官方服务:
资源简介:
Python Typos 500K是一个包含499,257条Python编程语言中常见拼写错误模式的数据集,比GitHub Typo Corpus(350K)更大。数据集包含150多个唯一单词,大小为21MB,错误类型包括邻近键替换、交换相邻字符、重复字符、省略字符以及多种突变组合。该数据集可用于训练自动纠正模型、开发IDE插件、代码质量工具和移动编码助手等。数据集的生成方法基于键盘输入错误,包括邻近键替换、字符交换、字符重复、字符省略和多种突变组合。
Python Typos 500K is a dataset containing 499,257 typo patterns commonly found in Python programming language, which is larger than GitHub Typo Corpus (350K). The dataset includes 150+ unique words, with a size of 21MB, and error types include neighbor key replacement, adjacent character swapping, character duplication, character omission, and multiple mutation combinations. This dataset can be used to train autocorrect models, develop IDE plugins, code quality tools, and mobile coding assistants. The generation method of the dataset is based on keyboard input errors, including neighbor key replacement, character swapping, character duplication, character omission, and multiple mutation combinations.
提供机构:
Diorrock



